DOCUMENT RESUME 



ED 294 890 



TM Oil 469 



AUTHOR 
TITLE 

INSTITUTION 

SPOi 6 AGENCY 

PUB DATE 

GRANT 

MOTE 

PUB TYPE 

EDRS PRICE 
DESCRIPTORS 



Natriello, Gary 

Evaluation Processes in Schools and Classrooms. 
Report No. 12. May, 1987. 

Johns Hopkins Jniv. , Baltimore, Md. Center for Social 
Organization of Schools. 

Office of Educational Research and Improvement (ED), 
Washington, DC. 
Hay 87 

OERI-G~86-0006 
98p. 

Information Analyses (070) — Viewpoints (120) 
MF01/PC04 Plus Postage. 

Academic Achievement; *Classroom Research; Elementary 
Secondary Education; Evaluation Criteria; ^Evaluation 
Methods; Literature Reviews; *Models; Research 
Design; Student Evaluation 



ABSTRACT 

Literature relating to evaluation processes in 
schools and classrooms is reviewed to develop a conceptual framework 
for integrating research on such evaluation processes. This report 
was prepared by the Middle School program. Commentary and research on 
elements of the evaluation process are examined in this framework, 
and the ways in which formal programs and policies have impact on 
evaluation are considered. The framework here presented is summarized 
as: (1) establishing evaluation purposes; (2) assigning tasks to 
students; (3) setting student performance criteria; (4) setting 
performance standards; (5) sampling information on student 
performance; (6) assessing student performance; (7) providing 
feedback to students; and (8) monitoring outcomes of evaluation. This 
framework is a first step toward the comprehensive framework needed 
for effective evaluation. A 13-page list of references and four 
tables are presented. (SLD) 



***************<*«***************************** 

* Reproductions supplied by EDRS are the best that can be made * 

* from the original document. * 
**************** *******f **************************** 



ERLC 



rsl 



iCenter for 
Research On 
ementary & 

Middle Schools 



r 



Report No. 12 

May, 1987 

EVALUATION PROCESSES IN SCHOOLS 
AND CLASSROOMS 

Gary Natriello 



U 8 DEPAIVTMENT OF EDUCATION 

Office of educ«tion«t Research and Improvemeni 

EDUCATIONAL RESOURCES INFORMATfON 
CENTER (ERICI 

KThis documeni has been reproduced as 
received from ihe person or organization 
originatir>g it 
D Minor changes have been made to improve 
reproduction quality 

• Points of view or opinions stated m this docu 
ment do not necessarily represent official 

OLRi position or policy 



•PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 

1. K^r^rir^fe^Q 



TO THE EDUCATIONAL RESOURCES 

^ ^ ^ INFORM/^TIOhl CENTER (ERiCi " 



% BESI COPY AVAILABLE 




Center Staff 



Edward L. McDill, Co-Director 
James M McPartland, Co-Director 



Karl L. Alexander 
Henry J. Becker 
Barbara A. Bennett 
Jomills H. Braddock II 
Renee B. Castaneda 
Barbara S. Colton 
Russell L. Dawkins 
Doris R. Entwisle 
Joyce L. Epstein 
Anna Marie Farnish 
Denise C. Gottfredson 
Gary D. Gottfredson 



Edward J. Harsch 
John H. Hollifield 
Lois G. Hybl 
Nancy L. Karweit 
Melvin L. Kohn 
Nancy A. Madden 
Alejandro Portcs 
Robert E. Slavin 
Carleton W. Sterling 
Robert J. Stevens 
Tammi J. Sweeney 
Shi Chang Wu 



Center Liaison 



Rene Gonzalez, Office of Educational Research and Improvement 



Patricia A. Bauch, Catholic University of America 
Jere Brophy, Michigan State University 
Jeanne S. Chall, Harvard University 
James S. Coleman, University of Chicago 
Edgar G. Epps, University of Chicago 
Barbara Heyns, Nev York University 

David W. Hornbeck, Maryland State Department of Education 
Michael W. Kirst, Chair, Stanford University 
Rebecca McAndrev/, West Baltimore Middle School 
Sharon P. Robinson, National Education Association 



National Advisory Board 



3 



ERIC 



Evaluation Processes in Schools and Classrooms 



Grant No. OERI-G-86-C006 



Gary Natriello 
Teachers Colleger Columbia University 



Report No. 12 



May 1987 



Published by the Center for Research on Elementary and Middle 
SchoolSr supported as a national research and development center by 
funds from the Office of Educational Research and Improvement r U.S. 
Department of Education. The opinions expressed in this publication 
do not necessarily reflect the position or policy of the OERIr and 
no official endorsement should be inferred. 



Center for Research on Elementary and Middle Schools 
The Johns Hopkins University 
3505 North Charles Street 
Baltimore, Maryland 21218 



Printed and assembled by: 

VSP Industries 
2440 West Belvedere Avenue 
Baltimore, Maryland 21215 



ERIC 



4 



The Center 



The mission of the Center for Research on Elementary and Middle 
Schools is to produce useful knowledge about how elementary and mid- 
dl } schools can foster growth in students" learning and development r 
to develop and evaluate practical methods for improving the effec- 
tiveness of elementary and middle schools based on existing and new 
researcl^ findings r and to develop and evaluate specific strategies 
to help schools implement effective research-based school and class- 
room practices. 

The Center condncts its research in three program areas: (1) 
Elementary SchoolSr (2) Middle SchoolSr and (3) School Improvement. 

lufi Elementary ^cliQ^l Program 

This program works from a strong existing research base to 
develop, evaluate, and disseminate effective elementary school and 
classroom practices; synthesizes current knowledge; and analyzes 
survey and descriptive data to expand the knowledge base in effec- 
tive elementary education. 

Xh£ Middle ^Cho^tl Program 

This program's research links current knowledge about early 
adolf'xence as a stage of human development to school organization 
and classroom policies and practices for effective middle schools. 
The major task is to establish a research base to identify specific 
pr^^lem areas and promising practices in middle schools that will 
contribute to effective policy decisions and the development of 
e:fective school and classroom practices. 

School Improvement Program 

This program focuses on improving the organizational performance 
of schools in adopting and adapting innovations and developing 
school capacity for change. 



This report, prepared by the Middle School program, develops a 
model for understanding and improving student evaluation processes 
in schools and classrooms, and reviews research on evaluation pro- 
cesses in terms of the model. 



5 



ERIC 



Abstract 



This paper reviews literature relating to evaluation processes in 
schools and classrooms. The review provides a conceptual framework 
to integrate research on evaluation processes in schools and class- 
roonSf examines commentary and research on elements of the evalua- 
tion processr and seek? to provide an understanding of how formal 
programs and policies affect evaluation processes. 



ERLC 



e 



Introduce: ion 



The evaluation of student performance is a central task of 
schools and teachers. Indeed, evaluation activities permeate the 
educational process. Although this is now particularly apparent, as 
schools iire under increased pressure for greater accountability ard 
improved performance, the pressure on and interest in evaluation 
processes as nothing new. Throughout the history of American educa- 
tion, evaluation of student performance has been an element of 
enduring concern to educators, to students, and to parents (Crooks, 
1933), and social scientists and educators have amassed a considera- 
ble body of research and commentary related to the evaluation pro- 
cess. 

Such work appears under a number of different rubrics — from 
testing, accountability, and standards to incentives, grading, and 
marking. Evaluation processes include those initiated and directed 
by teachers as well as those sponsored by the school, the school 
district, accrediting agencies, and state and federal governments. 

This review 1) provides a conceptual framework to integrate 
research on evaluation processes in schools and classrooms; 2) exa- 
mines commentary and research on elements of the evaluation process 
in terms of that conceptual framework; and 3) develops some under- 
standing of the ways in which formal programs and policies have an 
impact on evaluation processps in schools and classrooms. 



ERIC 



-1- 

7 



Conceptual FramewOrK Iqjl School and classroom Ev aluation Pro - 
cesse s 

Evaluation processes can be conceptualized in many ways. For 
example, evaluation might be considered as an interpersonal process 
with important implications for individual motivation , as a social 
and organizational process with substantial effects on social and 
institutional stability, or as a political process with an impact on 
the distribution of power and resources in a system (Natriello, 
1985) . A framework adopted to consider evaluation processes in 
schools and classrooms might contain elements of each of these 
approaches. We will emphasize a framework which permits the organi- 
zation and presentation of theory and findings on how educators 
structure the process of evaluation and the likely outcomes of the 
structure that is adopted. 

Figure 1 depicts the key elements in the framework for consider- 
ing evaluation processes in schools and classrooms. 



Insert Figure 1 About Here 



The purposes of student evaluation can be many and varied, and 
play an important role in determining the nature of the evaluation 
activities. The assignment of academic tasks to students sets the 
stage for the evaluation activities that follow. Through the pro- 
cess of assignment, students are put on notice that they are 



ERIC 



-2- 

8 



expected to perform a certain t*«>k. But to attempt to respond to 
teachers' expectations, they need information on the nature of the 
desired performance — they need criteria that are specified for the 
task performance which tell them what aspects of the performance are 
important to the teacher. Information for task performance also 
comes from standards that communicate the level of performance that 
students are supposed to achieve, with tasks assigned and criteria 
and standards established and communicated, students are in a posi- 
tion to engage in the appropriate activities. 

Collecting information on student performance of assigned tasks 
and the outcomes of those tasks involves a sampling process, because 
total information is typically impractical or impossible to col/lect. 
The sample of information on student performance may be used *n con- 
junction with the criteria and standards as evaluators actually 
develop an appraisal of the student performance. Once the appraisal 
is developed, it still remains for the evaluator to communicate the 
results of the appraisal to the student performer. The feedback 
process might then lead to a number of outcomes which, presumably, 
relate to the original purposes of the evaluation process. 

This model of the evaluation process lays out in a generic way 
the various elements of evaluation. It is not unlike other models 
of evaluation and octroi processes (e.g., Lawler, 1976) and moJelc 
of cybernetic feedback processes (e.g.. Bloom, 1980). But it also 
suggests how the various elements may be related to one another. 
For example, the purposes of the evaluation process are likely to 



ERIC 



-3- 



influence how tasks are assigned, the kinds of criteria that are 
set, how samples of student performance are collected^ the appraisal 
process, and the nature of the feedback provided to students. But 
the model does not suggest that the stages must take place only in 
the order portrayed. The circular arrangement of elements conveys 
the notion that certain evaluation procedures are adopted for his- 
torical and idiosyncratic reasons that may have little to do with 
other procedures. For example, the sampling of student performance 
might derive from established procedures that limit the purposes to 
which the evaluation can be puc. (Glaser, 1963, observes the inap- 
propriate use of norm-referenced tests to assess the effects of edu- 
cational programs.) So too, the mechanisms for providing feedback 
to students may stem from tradition and provide poor information for 
assessing performance in terms of certain criteria. Critics of 
traditional report cards have charg2d that they provide little 
insight for students and parents interested in working to improve 
performance (Giannangelo and Lee, 1974). Thus, the model of ele- 
ments of the evaluation process describes a rational progression for 
the process, but also reveals the somewhat less than rational nature 
of the process as it operates in schools and classrooms. 

ThS. Purposes OL student Evaluation In ^£hQSllS. and Clasfirooma 

Aside from the obligatory brief section on the purposes of evalu- 
ation at the front of texts on measurement and evaluation (e.g., 
Reminers, Gage, and Rumme, 1960; Lien, 1967; Ahmann and Glock, 1967) , 
the purposes of evaluation receive scant attention, which is parti- 



id 

ERIC 



-4- 

10 



cularly ironic in a literature that encourages teachers to specify 
educational goals and objectives as part of the evaluation process. 
The lack of discussion and theoretical analysis of the purposes of 
evaluation is consistent with the virtual ahseace of data on what 
educators at all levels believe the purposes c. evaluation to be or 
of how they might prioritize multiple purposes. 

The literature on the purposes of evaluation in schools and 
classrooms is a literature of lists and incidental notes produced by 
various commentators and researchers, some of whom have noted a pur- 
pose or two of evaluation by way of introduction to other issues. A 
synthesis of these lists and items produces the master list pre- 
sented in Table 1. 



Insert Table 1 About Here 



Pour generic functions appear in these statements of the purposes 
of evaluations and permit a reasonakly parsimonius classification. 
These functions are certification, selection, direction, and motiva- 
tion. Each represents a distinct purpose of the evaluation pro- 
cesses that occur in schools and classrooms. 

Certification refers to the assurance that a student or program 
has attained a certain level of accomplishment or mastery. At the 
program level, certification typically involves some type of accred- 
itation. At the individual level, certification might involve the 



5- 

n 



i88uanct of some sign of assurance such as a diploma or a recommen- 
dation for promotion* 

Selection is the identification of suitable individuals, sub- 
groupSf and groups of individuals to be recommended for or permitted 
to enter or continue along certain educational and occupational 
paths. Evaluations are used to identify students for courses of 
studyt programs, higher educational opportunities, and various lev- 
els of anployment. At the program level, selection involves choices 
eunong competing programs foL continuing public support. The 
expected op/ccome is the improvement of individual and program per- 
formance. 

Direction refers to the use of evaluation processes to communi- 
cate to those being evaluated the specific desires of the evalua- 
tors. Evaluations provide key information to focus the attention of 
those being evaluated, whether they be the students in a classroom f 
or the teachers and administrators implementing an educational pro- 
gram. Such information may be criteria that communicate the appro- 
priate emphases on tasks, or standards that communicate the desired 
level of performance. 

ytotivation entails involving those being evaluated in the tasks 
upon which the evaluation will be based. If the directing function 
or purpose of evaluations assures that individuals are aware of how 
they are expected to perform, the motivation function or purpose 
assures that individuals will be willing to commit the effort neces- 
sary to perform the task. 



12 

ERIC 



These four purposes of evaluation — certification, selectionr 
direction f and motivation have important effects on the other 
elements of evaluation processes. Although no data exists on the 
relative role of ^hese four purposes in evaluation processes in 
schools and clas^tooms, most evaluation systems reflect at least 
some interest in each. 

Student ZAaX& In schools and Claasroomg 

The assignment of tasks to students is the beginning of the eval- 
uation process in classrooms — the student must first be given the 
responsibility for performing the task. While students generally 
work with a relatively stable set of tasks, the specific student 
tasks will constantly change if students are making the expected 
progress in their development. 

The task assignment process consis j of several distinctly diffe- 
rent facets. Hackman (1969) defines a task apt consisting of "...a 
stimulus complex and a set of instructions which specify what is to 
be done via a vis the stimuli. The instructions indicate what oper- 
ations are to be performed by the subject (s) with respect to the 
stimuli and/or what goal is to be achieved." Thus Hackman sees the 
task as consisting of stimulus materials, instructions about opera- 
tions , and instructions about goals. 

A similar approach is adopted by Dornbusch and Scott (1975) r who 
distinguish between tasks assigned by delegations and those assigned 
by directives. The former involves specifying a goal and permitting 




-7- 



1 



the performer to make at least some non-trivial decisions about how 
to attain that goal. The latter involves the selection of a path or 
set of activities which are then communicated with the expectation 
that the performer will carry out the prescribed course of action. 



Doyle (1983:161) adds a third element to considerations of tasks 
in classrooms: 

The term ''task" focuses attention on three aspects of students' 
workt (a) the products students are to formulate, such as an 
original essay or answers to a set of test questions; (b) the 
operations that are to be used to generate the product such as 
memorizing a list of worci:< or classifying examples of a con- 
cept; and (c) the "given^i" or resources available to students 
while they are generating a product, such as a model of a fin- 
ished essay supplied by the teacher or a fellow student. Aca- 
demic tasks, in other words, are defined by the answers stu- 
dents are these answers. 



Classrooms and schools are dominated by tasks. Doyle (1983:162) 
argues that tasks are crucial features of schools and classrooms, 
"that tasks form the basic treatment unit in classrooms" and that: 

1. Students* academic work in school is defined by the academic 

tasks that ar^ embedded in the content they encounter on a 
daily basis. Tasks regulate the selection of information 
and the choice of strategies for processing that informa- 
tion. Thus, "changing a subject's task changes the kind of 
event the subject experiences" (Jenkins, 1977:425). 

2. Students will learn what a task leads them to do, that is, 

they will acquire information and operations that are 
necessary to accomplish the tasks they encounteic (see 
Frase, 1972, 1975). In other words, accomplishing a task 
has two consequences. First, a person will acquire infor- 
mation—facts, concepts, principles, solutions — involved in 
the particular task that is accomplished. Second, a person 
will practice operations — memorizing, classifying, infer- 
ring, analyzing — used to obtain or produce information 
demanded by the task. 



-8- 

14 



Because the nature of student tasks has such a pervasive influ- 
ence on the classroom r it is important to understand the tasks which 
dominate the academic work of students. Doyle (1983:162-163) iden- 
tifies four qeneral types of academic tasks in classrooms: 

1) memory tasks in which students are expected to recognize or 
reproduce information previously encountered (e.g., memorize 
a list of spelling words or lines from a poem) ; 

2) procedural or routine tasks in which students are expected 
to apply a standardized and predictable formula or algorithm 
to generate answers (e.g., solve a set of subtraction prob- 
lems) ; 

3) comprehension or understanding tasks in which students are 
expected to (a) recognize transformed or ^raphrased ver- 
sions of information previously encountered, (b) apply 
procedures to new problems or decide from among several 
procedures those which are applicable to a particular prob- 
lem (c.g.f solve ''word problems** in mathematics), or (c) 
draw inferences from previously encountered information or 
procedures (e.g., make predictions about a chemical reaction 
or devise an alternative formula for squaring a number) ; 

4) opinion tasks in which students are expected to state a pre- 
ference for something (e.g., select a favorite short story). 



The academic tasks that dominate schools and classrooms have 
important implications for evaluation and control processes. Cer- 
tain characteristics of academic tasks are particularly likely to 
affect the operation of the evaluation system in a classroom. For 
example f Dornbusch and Scott (1975:80) have suggested that tasks 
differ in predictability, that is, ''the extent to which the per- 
former has knowledge of which path is most likely to lead to sue- 
c^BB.*<\> They argue that the greater the predictability of a task, 
the more likely th^ it will be assigned by a directive which speci- 
fies the path or procedures to be followed in executing the task. 
On the other hand, tasks that are low in predict iability will be 



more likely to be assigned by a delegation which specifies a desired 
end state or goal and gives the performer the autonomy to make non- 
trivial decisions about how to attain that end state* Thus the task 
assignment that starts the evaluation process in motion would be 
likely to differ depending upon the nature of the task. Moreover r 
Dornbusch and Scott (1975) demonstrate thdt when tasks are predicta- 
ble, performers prefer directives; when tasks are unpredictable, 
performers prefer delegations. 

Doyle (1979) linked the nature of tasks to the evaluation process 
in terms of the ambiguity and risk associated with academic work in 
classrooms. He argued that because academic tasks in classrooms 
were performed in the context of an evaluation system, they were 
performed under conditions of varying ambiguity and risk« "Ambigu- 
ity refers to the extent to which a precise answer can be defined in 
advance or a precise formula for generating an answer is availa- 
ble«^«Risk refers to the stringency of the evaluative criteria a 
teacher uses and the likelihood that th*^se criteria can be met on a 
given occasion" (Doyle, 1983:183). He classified understanding and 
opinion tasks as high in ambiguity and memory and routine tasks as 
low in ambiguity. Opinion tasks and certain memory tasks (i.e«, 
those involving the reproduction of small amounts of material) and 
certain routine tasks (i.e., those requiring relatively simple 
algorithms) were cl^'^>sified as low in risk. Understanding tasks and 
other memory tasks (i.e., those involving the reproduction of large 
amounts of material) and other routine tasks (i.e., those involving 
complicated procedures) were classified as high in risk. 



ERLC 



-10- 

16 



Academic tasks that are less predictable or that carry greater 
ambiguity and risk place greater demands on evaluation processes. 
Dornbusch and Scott (1975) and Thompson (1967) observe that when 
organizational goals are ambiguous as opposed to crystalized, per-- 
formers may receive vague quality criteria. Such criteria often 
result in an evaluation process that is "...arbitrary and post hoc 
at every stepr with the result that performers are unable to relate 
the performances to the evaluations received" (Dornbusch and Scott r 
1975:258). when the relationship between procedures or operations 
and results or products is predictable r student performance can be 
evaluated by collecting information on the results or products. 
Indeed, such products can be designed for the convenience of the 
teacher as an evaluator. This is even more convenient when the 
tasks are low in ambiguity and thus have a clearly defined and pre- 
cise product. Such tasks could be evaluated relatively easily no 
matter what the purpose of the evaluation — certification, selec- 
tion, direction, or motivation.. 

However, when tasks are low in predictability, the evaluation 
cannot rely solely upon inspection of a product or result if the 
purpose of the evaluation is to provide direction or enhance motiva- 
tion. Inspection of results for an unpredictable task or a task 
high in ambiguity does not provide sufficient information for an 
evaluator to use to help students improve their performance. K^re- 
over, evaluations of unpredictable tasks based on results or pro- 
ducts provide no information for the evaluator to determine the 
effort and performance that led to the product. As a result, the 



ERIC 



-11- 

17 



evaluator will not be able to structure the evaluation to maintain 
or enhance the motivation of the student. This problem Is com- 
pounded by the fact that academic tasks generally Involve mental 
processes that are not readily visible to teachers In classrooms 
(Natrlello and Dornbuschr 1984). 

Two processes appear to be set In motion by the strain that 
unpredictable and ambiguous tasks place on evaluation. First, there 
Is a tendency to avoid unpredictable or ambiguous tasks In schools 
a;id classrooms. Doyle (1983) reviews studies by Davis and McKnlght 
(1976) and Wilson (1976) which suggest that students resist the 
shift from routine or procedural tasks to understanding tasks In 
classrooms. After trying to make such a shift in a mathematics 
classf Davis and McKnlght (1976:282) commented that "It Is no longer 
a mystery why so many teachers and so many textbooks present ninth 
grade algebra as a rote algorithymlc subject. The pressure on you 
to do exactly that Is formidable." Besides resisting the Introduc- 
tion of less predictable tasks, students also attempt to renegotiate 
assigned tasks so that they are more predictable by soliciting more 
Information from the teacher on the specifics of the performance and 
results desired. 

Teachers may also devote less attention to less predictable tasks 
when evaluating students. Natrlello and Dornbusch (1984) demons- 
trate that teachers and administrators present students with more 
frequent end more challenging evaluations for behavior tasks which 
are conceived of as more predictable than for academic tasks which 



ERIC 



-12- 

J8 



are conceived of as less predictable. This same reasoning is used 
by Holmes (1978) to explain why schools and teachers are less likely 
to take seriously the evaluation of students in the affective 
domain, where tasks are conceived of as even less predictable than 
in the academic domain. 

A second process may be set in motion by the strain that unpred- 
ictable tasks place on evaluation systems — the tendency to struc- 
ture evaluation activities as if the tasks being evaluated are pred- 
ictable and unambiguous. In a study of three reading curricula, 
Armbruster, Stevens, and Rosenshine (1977) found that although the 
texts emphasized comprehension and interpretation skills, the tests 
solicited factual information from students based on the ability to 
locate information in the text. Treating tasks as if they are pred- 
ictable simplifies the evaluation process. 

Although types of tasks differ in ways that affect evaluation 
processes in classrooms and schools, all academic tasks are complex. 
Reviewing recent research on tasks and cognitive development, Doyle 
(1983sl73) points out that "In sum, school tasks, even at the level 
of basic skills, are inherently complex for all students. This com- 
plexity is much more severe, however, for young students and those 
who lack either the information or the skills required to understand 
tasks, process information in specific ways, or decide when to use 
the strategies they possess." Such complexity carries important 
implications not only for evaluation and control systems (Dornbusch 
and Scott, 1975) , but also for the structure of work groups and 



ERIC 



-13- 

/.9 



organizations (Scott, 1981) . More complex tasks require more 8op-> 
histicated evaluation processes to assess student performances accu- 
rately. 

The assignment of a task communicates to a student that he or she 
is responsible for performing that t'lsk. However, evaluators are 
generally interested in more than the performance and completion of 
tasks (Dornbusch and Scott, 1975); they are also concerned with cer- 
tain properties and levels of the performance and of the final pro- 
duct. Thus, in addition to the task assignment, the evaluation sys- 
tem requires the setting of criteria and standards for a task. 

Considerable confusion surrounds these issues in the evaluation 
of student performance in schools and classrooms. One type of cr^n- 
fusion — the failure to clearly distinguish between criteria and 
standards — was introduced in the literature on criterion-refer- 
enced testing and has now become pervasive among researchers and 
practitioners alike, no doubt testimony to the effectiveness of 
courses in tests and measurement. 

Glass (1978) traces the use of the term "standard" in the work of 
Mager (1962) and Popham (1973) on instructional objectives. Bloom 
(1968) on mastery learning, and Tyler (1973) on the role of testing 
in assessment programs. In each case "standard" is used to refer to 
a level of acceptable performance in behavioral terms. He next con- 
siders the work of Glaser (1963) on criterion-referenced tests, work 



-14- 

20 



in which Glaser assumed that there were continua of attainment lev- 
els along which student performance could be described. Finallyr he 
argues that 

Glaser *s use of the word "criterion* with its colloquial mean- 
ing of "standardr" the simultaneous publication of Mager's 
rather simple notions of performance standards, and Popham*s 
mixing of Glaser and Mager in the same pot combined to create 
the impression that the "criterion" in criterion-referenced 
testing was not the behavioral scale articulated to a test and 
elaborating the meaning of the scores, but rather that the 
"criterion" was the cut-cff score, the division between pass 
and fail, or competence and incompetence. This interpretation 
of the word "criterion" is evident in the informal conversation 
of both educators and measurement specialists. This meaning is 
intended when people speak, as they do now habitually r of "set- 
ting the criterion on a criterion-referenced test or test 
item." (Glass, 1978:241) 



The continuing confusion of the terms "criteria" and "standards" 
makes it particularly important to distinguish them in considering 
their role in evaluation systems. Criteria refer to the properties 
of the task that should be taken into account in making the evalua- 
tion (Dornbusch and Scott, 1975:138). A standard, on the other 
handr refers to the evaluative scale whose ... 

Intervals constitute degrees of acceptability or preference, 
the scale typically ranging from low scores indicating 'totally 
unacceptable' values at one end of the continuum to high scores 
indicating 'highly acceptable' or perhaps 'exceptional* values 
at the other end. A standard may consist of a single point on 
the evaluative scale separating acceptable from unacceptable 
values. More typically, however, a standard consists of a set 
of points distinguishing various levels of acceptability or 
non-acceptability. In addition to the scale itself, the stan- 
dard also includes a set of rules to transform values on the 
performance dimension into scores on the evaluative scale 
(Dornbusch and Scott, 1975:140). 



A second type of confusion, also rooted in the traditional treat- 
ment of behavioral objectives, involves the failure to distinguish 



ERIC 



-15- 

^1 



the dual aspects of a standard ~ the component related to the lev- 
els of the Important properties of the assigned task (i.e., the cri- 
terion levels) , and the component related to the collection of 
information on the performaace dimensions (i.e., the sampling pro- 
cess). Discussions of behavioral objectives (e.g., Gronlund, 1971> 
Krathwohl and Payne, 1971; Brown, 1970; Lindvall, 1961; Remmers, 
Gage, and Rummel, 1960; Lien, 1967; Ahmann and Clock, 1967) typi- 
cally present levels of objectives that range from general objec- 
tives (criteria) to specific objectives which have the desired stu- 
dent behaviors ciearly identified (indicators). Such presentations 
unintentionally confuse the properties that are of interest to the 
teacher or evaluacor with the evidence of student performance in 
terns of those properties and the standards for perforrance. 

Such melding of criteria, standards, and indicators may make the 
development of learning objectives more concretely understandable to 
teachers, but it also locks them into an overly empiricist concep- 
tion of these relationships. The specific objectives (indicators) 
are interpreted as being in one-to-one correspondence with the gen- 
eral objectives (criteria) .<2> Under such circumstances, the indica- 
tors of student performance can take on the role and importance of 
the criteria themselves. Levine (1976) observes this process occur- 
ring in achievement testing in schools, both in the writing of 
experts who argue that achievement tests are absolute criteria in 
themselves (e.g., Lindquist, 1969) and in cases where the testing 
program has dictated school policy (Levine and Levine, 1970), To 
avoid such an empiricist trap, the present conceptual scheme distin- 



ERIC 



-16- 

?2 



guishes among criteria, standards, and the indicators of perfor- 
mance. 

Serious discussions of the criteria properly associated with stu- 
dent academic tasks tend to be specific to various content or curri- 
culat areas. For example, Poyle (1983) reviews Culler's (1980) ana- 
lysis of the criteria involved in competence in literature and 
Fredriksen and Dominic's (1981) analysis of the criteria involved In 
the composing process. Because criteria for the evaluation of stu- 
dent performance must identify the important properties of various 
student tasks, the criteria themselves must be specific to the tasks 
if they are to have meaning in the context of the work of students. 
Yet discussions of the evaluation of student work typically pay lit- 
tle attention to the specific tasks being evaluated. Rather, an 
evaluative technique is applied which may or may not be appropriate 
for the tasks in question. As noted earlier, the application of 
such techniques may then transform the nature of the tasks to con- 
form to the evaluation process. 

While there is little discussion of task-specific criteria tor 
evaluation in the evaluation literature, attention has been devoted 
to the types of criteria employed in the evaluation process. The 
achievement of students in a subject is generally accepted as the 
one criterion common to all evaluation systems in schools and class- 
rooms (Brown, 19*70). The appropriateness of usir ^ achievement 
criteria is seldom discussed, although increased attention is being 
paid to determining whether the evaluation process is linked to the 



17- 



instructional process (Linn, 1983; Rudman, Kelly, Wanous, Nehrensr 
Clark and Porter, 1980) • The latter is generally accomplished by 
matching the testing procedures to the goals and objectives of 
instruction r<3> so that students are not subjejted to an evaluation 
process that involves things not covered in the instructional pro- 
gram ~ a problem for both groups of students and for individual 
students (Natriello, 1982). 



Types of criteria other than achievement criteria enter into 
evaluation processes in schoo3s and classrooms, but there ie little 
agreement as to which of these are appropriate. For example. Thorn- 
dike (1967:762) notes that: 

In practice, certainly many other considerations than that of 
pure competence do enter into marks. Such factors enter in as 
(1) industry and effort — i.e., completing all assigned work and 
even doing optional work for "extra credit" (a kind of educa- 
tional bribe); (2) frequent and active participation in class 
discussion; (3) neatness in written work and mechanical cor- 
rectness in such areas as spelling and grammar; and (4) per- 
sonal agreeableness, attractiveness, cleanliness, and docility. 
To some extent and by some instructors, certain of these fea** 
tures would be endorsed as legitimate influences on a mark. 
Others would more uniformly be accepted as extraneous influ- 
ences, to be minimized as far as possible. 



Holmes (1978), observes that criteria related to behavior and 
effort, and particularly criteria such as politeness, conformity, 
and perserverance — those things which make it possible for the 
organization to operate — often covertly enter into the evaluation 
process. He argues that more formal attempts should be made to 
include criteria from tht at.ective domain (such as attitudes, 
values, and moral reasoning) in student evaluations. Brown and 
Craig (1977) value these other criteria, but rejoct the notion that 



ERIC 



-18- 



they can be Incorporated into systems for the evaluation of students 
in schools and classrooms. 

Several studies have examined the use of criteria other than 
achievement in evaluation systems for students. Schunk (1983) 
reports on an experiment in which students were subjected to three 
types of evaluation systems — one in which they received rewards 
for their actual performance r one in which they received rewards for 
simply participating (i.e.r honoring the task assignment), and one 
in which they received no rewards, while the performance contingent 
reward system led to the highest levels of achievement r the system 
of rewards for participation showed no benefit over the no-rewards 
system. Salganik (1982) reports on a system in which students were 
evaluated on three criteria — achievement, effort r and conduct. 
Although the correlations among these three types of criteria were 
high, in those cases in which there were discrepant evaluations 
among the three criteria, the evaluations based on student effort 
seemed to have some positive motivational effects on low-achieving 
students* Weiner .1979), reviewing research on attributions and 
motivation, found that evaluators placed greater importance on 
effort than on ability in determining reward and punishment under 
conditions in which performance was held constant. Natriello and 
McPartland (1987), in a national survey of secondary school teach- 
erSf found that student effort was "very important" or "extremely 
important* in the evaluation process of over 70% of the teachers. 



-19- 

ERIC 



Types of criteria other than those related to achievement clearly 
play a part in the evaluation of students in schools and classrooms. 
Additional research should provide descriptions of these non--ac-- 
hievement criteria and the ways in which they are used by teachers. 
Because teachers report the use of multiple criteria (Natriello and 
NcPartlandr 1987) and because students are assigned multiple tetsks 
in classrooms r some attention should be paid to the relative weight 
assigned to various tasks and criteria in arriving at final evalua- 
tions (Brownr 1970; Dornbusch and Scottr 1975) • 

Standards 

The standards used in the evaluation of students have received 
considerable attention from both the public at large and educational 
researchers. Recently r there have been renewed calls for higher 
standards in U.S. schools (National Commission on Excellence in Edu- 
cationr 1983). Calling for ""higher" standards requires only the 
assumptions that current standards are too lowr and that "higher" 
standards will somehow lead to better educational outcomes — 
assumptions for which there is reasonable evidence at least for some 
groups of students (Natriello and Dornbuschr 1984; McDillr 
NatriellOr and Pallas, 1985). But calls for "higher" standards rely 
upon current standards as a point of reference and thereby avoid a 
key area of controversy in the evaluation of students. 

Researchers and practitioners have produced a considerable body 
of work which examines the appropriate reference point for estab** 
lishlng standards for evaluating students. This work, which typi- 




-20- 



^0 



cally considers various systems that might be employed for setting 
standards r appears to grow out of a fundamental dilemma faced by 
schools in evaluating student performance. This dilemma is clearly 
described by Bidwell (1965:973) in discussing the school as a formal 
organisation. First, he notes that in order to produce a uniform 
product, schools and teachers engage in the ''universalistic and thus 
uniform assessmor of student accomplishment.** Teachers present 
organizational standards to students and evaluate their performance 
in terms of those standards. However, presenting standards to stu- 
dents and demanding their compliance may not be enough to promote 
learning, in view of the fact that students 'ay not see schools as 
relevant to their immediate interests. Citing Waller (1932), Bid- 
well (1965:979) points out that "motivation to learn#..is very 
laraely a product of a close, warm relation between teacher and stu- 
dent,* and suv^gests that the nature of school organizations requires 
teachers to exhibit instances of universal istic assessment, as well 
as instances of more diffuse responsiveness, in order to be effec- 
tive with students. Similarly, Varenne and Kelly (1976) see the 
school as caught by the paradox characteristic of American culture 
which incorporates both the belief in the equal endowment of all 
with regard to certain inalienable rights and their unequal endow- 
ment with regard to individual capacities. This paradox requires 
the school to utilize universal criteria for evaluation and rewards 
that are cailored to individual performance. 

Research and commentary on appropriate standards for the evalua- 
tion of student performance in schools and classrooms examine the 



-21- 



struggle to accomodate both universalism and individualism in a sin- 
gle system. Out of this struggle have emerged three types of stan- 
dards: those set in reference to the criterion level of a group, 
those set in reference to some absolute criterion level, and those 
set in reference to the pr^^vious criterion level of an individual 
(Wise and Newman* 1975; Rheinberg, 1983; Thorndike, 1969). Discus- 
sions of standards for the evaluation of students revolve around the 
advantages and disadvantages of employing each of these three types. 

Norm- or group-referenced standards have been criticized by edu- 
cators and social scientists alike, perhaps because they have been 
in widespread use for such a long time. Terwilliger (1978) refers 
to the use of norm-referenced standards as "norm-referenced grading" 
and specifies four variations. The most commonly discussed approach 
is the use of a normal curve with a specific class, a practice that 
Terwilliger traces to the "scientific movement" in education in the 
1930*8, an observation borne out by Crooks' 1933 account of then 
current thinking on grading. In this method, teachers use the test 
scores of students in their class to create the normal or bell- 
shaped curve. This method is typically referred to as "grading on 
the curve" (Bresee, 1976) . A second method relies on the same nor- 
mal curve but includes the evaluative scores of a larger group of 
students beyond the immediate class, (e.g., all of the students 
receiving similar instruction currently or in the recent past). A 
third variation is restricted to the immediate class, but assigns 
grades based on a distribution other than the normal curve. A 
fourth variation assigns grades using a distribution other than the 
normal curve and uses a reference group beyond the immediate class. 

-22- 



Additional varieties of norm-referenced standards have been iden- 
tified by Michaels (1977) and Slavin (1977) while discussing class- 
room reward structures. Michaels (1977) detines "individual compet- 
ition'' as a reward structure in which grades are assigned to 
students based on their performances relative to those of classmates 
and "group competition* as a reward structure in which grades are 
differentially allocated to groups according to their relative per- 
formance. Slavin (1977) designates similar reward structures as 
■competitive reward structures" and "group competition," respec- 
tively. Both analyses point out that norm-referenced standards can 
apply to different levels in a system — individual students, groups 
of students, programs, schools, etc. 

Terwilliger (1978) links norm-referenced standards to what he 
terms the pragmatic philosophy, a viewpoint primarily concerned with 
practical choices and the consequences of such choices. An evalua- 
tion system which differentiates among individual students is opti- 
mal for identifying the available choices and their consequences. 
Thus, norm-referenced standards would appear to se'^ve the purpose of 
selection identified earlier. Rheinberg (1983) links norm-refer- 
enced standards to the rationale of psychological testing and the 
associated concerns for objectivity, reliability, and validity of a 
teacher's grading process. He notes that "Perhaps because of this 
orientation towards psychological testing theory, an implicit 
assumption was perpetuated: 'Correct* evaluation of academic 
achievement has to be based on social comparison between students r 
leading to a normal distribution of grades." (Rheinberg, 1983:185). 




-23- 



Levine (1976:233-234) explains how the Interests of educational psy- 
chologists In producing distributions of scores amenable to statis- 
tical analysis overrode the Interests of teachers who would have 
preferred scales which enabled them to see where students were and 
where they had to 00, and who would have preferred not to harm stu- 
dents by using national norm-referenced standards which placed half 
of them below the national standards. 

The extent to which teachers actually use norm-referenced stan- 
dards has received too little attention. Rudman, Kelly, Wanous, 
Mehrens, Clark, and Porter (1980:32), after reviewing studleo which 
described teacher testing preferences (i.e., Yeh, 1978; Olejnlk, 
1979; O'Regan, et al., 1979; Nearlne, 1970), conclude that "Those 
studies that were descriptive tended to show a preference for norm- 
referenced tests and the standard scores in which the results are 
couched.* On the other hand, in the national survey reported on by 
Natrlello and McPartland (1987), secondary school teachers rated 
norm--referenced standards as less important than either criterion- 
referenced standards or individually-referenced standards in deter- 
mining student grades* In addition, Gullickson's (1982) survey of 
South Dakota teachers revealed that 01. Ty 10% of the respondents 
reported grading on a curve. Rheinberg (1983) notes that teachers 
who preferred norm-referenced standards tended to organize classroom 
tasks so that all students engaged in uniform tasks to facilitate 
comparisons and to view student achievement differences as very 
clear and stable properties of students. Although further evidence 
Is necessary before reaching conclusions, it may be that teachers 



ERIC 



-24- 

30 



seek to use formal achievement tests with norm-referenced standards 
to balance their own criterion-referenced and individually-refer- 
enced standards in the classroom* 

Although little evidence exists on the extent to which norm 
referenced standards are actually used in schools and « lassroomSr 
critiques of the practice abound. General critiques are provided by 
Bresee (1976) and Deutsch (1979). Bresee (1976) lists a series of 
problems with such standards: (1) the necessity of producing a nor- 
mal distribution of grades conflicts with the goal of having teach- 
ers produce improvement in all students in a class; (2) the distor- 
tion of the curriculum as teachers seek to diversify instructional 
objectives to produce a range of achievement in a class; (3) the 
diversion of student attention from the task at hand to the perfor- 
mance of other students; and (4) the introduction of false competi- 
tion because achievement is not really in limited supply. To these 
Deutsch (1979) adds: (1) the distortion of the testing process 

so that tests take the form of contests in which all per- 
formers participate under uniform conditions; (2) the lack of 
rewards created by the artificial scarcity of good grades that is 
likely to impede the development of students' sense of their own 
value; and (3) the encouragement of competition which may be coun- 
terproductive for tasks requiring cooperation and communication • 

Implicit criticisms of norm-referenced standards have come from 
advocates of criterion-referenced testii..^<4> (Glaser, 1963), who 
tend to point out how unsuited such standa*^ds are for providing 



-7,5- 



r\ - 



insight into the effectiveness educational treatments or pro- 
grams; and from advocates of individually-referenced standards 
(Beady and Slavin, 1981) , who decry the deficiencies of relative 
standards for providing direction and motivation for certain stu- 
dents. Thus, the critiques of norm-referenced or relative standards 
center around the application of those standards to purposes such as 
accreditation, direction, and motivation, for which they are ill- 
suited. 

Criterion referenced or absolute standards have enjoyed a great 
deal of attention due to the criterion-referenced testing movement. 
Glass (1978) observes that contemporary educational movements for 
accountability, mastery learning, assessment, competency-based edu- 
cation* and minimal competence graduation requirements have received 
increasing attention. Each of these approaches relies on some abso- 
lute set of standards. 

Terwilliger (1978) notes the forms that the use of absolute stan- 
dards in the classroom can take. He identifies the "percent-correct 
system" as an approach "in which 100 represents a »-«rfect perfor- 
mance and some arbitrarily designated value (e.g., 70) represents 
the minimal 'passing' score, if letter grades are employed, grades 
are defined in terms of specified ranges on the percent-correct 
scale, e.g., A-95-100, B-87-94, C-78-86, D-70-79, P»69 or below." 
(Terwilliger, 1977:31). A second approach attempts to build spe- 
cific meaning into criterion-referenced systems by specifying the 
minimal level of performance that is acceptable. A third, more lim- 



-26- 

o r, 



itedf approach to absolute standards focuses on the quantity of a 
certain task that is completed by a student. 

Discussions of reward structures by Michaels (1977) and Slavin 
(1977) suggest additional approaches to absolute standards. Micha- 
els (1977) uses the term "individual reward contingencies" to 
describe a reward structure in which the performance of individual 
students is compared to a previously established standard. He uses 
the term "group reward contingencies" to describe a reward structure 
in which the performance of each group is independently compared to 
a previously established stande.rd. Slavin (1977) uses the term 
"independent reward structure" to describe a reward structure in 
which the probability of a student's receiving a reward is unrelated 
to the probability of any other student receiving a reward (as when 
the performance of individual students is compared with a fixed 
standard). He uses the term "group contingencies" to describe a 
situation in which the group is evaluated against a fixed standard. 

Terwilliger (1978:23) associates absolute standards with the 
behaviorist perspective on education which argues that "the optimal 
conditions for learning require a highly structured individualized 
approach in which materials are presented in relatively discrete 
jnits." and "...stresses the need for identifying in adv&ncet 1) 
the pxecise objectives of instruction, 2) the exact instructional 
objectives to be employed, and 3) the specification of the criteria 
used for judging whether the objectives have been attained." 



-27- 



Pew studies have examined the extent to which teachers actually 
use absolute criteria in evaluating student performance « Approxi-* 
mately three-fourths of the teachers in the national sample examined 
by Natiiello and McPartland (1987) reported that absolute standards 
for achievement were "very important" or "extremely important** in 
arriving at a final grad^ for students in their classes* This is 
consistent with Gullickson's (1982) finding that 78% of teachers in 
his South Dakota sample reported using some kind of criterion-refer- 
enced grading scheme. However, Rudnan, et al. (1980) point out that 
although commentary on test use suggests that teachers prefer cri** 
terion-ref erenced tests over norm- referenced tests, descriptive ctu- 
dies show the opposite and only 35% of the teachers surveyed by Beck 
and Stetz (1979) favored increased use of criterion-referenced 
tests* 

Discussions of absolute standards have paid considerable atten-^ 
tion to various methods for arriving at mastery, competency levels, 
or cutting scores (Berk, 1976; Haunbleton, Swaminathan, Algina, and 
Coulson, 1978; Meskauskas, 1976; Nedelsky, 1954). Glass (1978) and 
Burton (1978) review various methods fci* determining where to set a 
mastery level on a continuum and conclude that standards must be set 
arbitrarily. Shephard (1976) concludes that current methods of set- 
ting absolute standards all reduce to a form of norm-referenced 
standards. The inability to set standards by other than arbitrary 
means causes Glass to reject absolute standards on standardized 
tests and argue for the use of improvement as a basis for evalua- 
tion* Scriven (1978) provides a counter perspective which argues 



that absolute standards are not totally arbitraiy and may still be 
employed in inlnlmal competency testing. However, these arguments 
serve to underscore the fact that absolute standards are proble- 
matic, particularly in cases such as statewide testing programs in 
which decisions about standards are removed from the informed pro- 
fessional opinion of the teacher (Burton, 1978) . 

Individually-referenced or self-referenced standards are based on 
comparing a student's current performance with some other feature of 
the student. Terwilliger (1978) distinguishes two forms of self-e- 
valuation, comparing current performance with earlier performance 
and assessing growth, and comparing current performance to a stu- 
dent's ability. Terwilliger views the use of self- referenced stan- 
dards as an attempt to "recognize individual differences, reward 
effort and generally provide an environment which fosters interest 
and motivation" (Terwilliger, 1978:32). He associates it with the 
humanist view of education which is concerned with "the values, 
interests, and dignity of each individual student as a human being" 
(Terwilliger, 1978:24). 

Rheinberg (1983) links self-referenced or individually-referenced 
standards to the work of European educational theorists such as Her- 
bart and Pestalozzi. He provides quotes from each — "The teachers 
does not compare his student with others but with the student him- 
self" (Herbart, 1831:10) and "I was patient with the slowest lear- 
ner; but if one of the students did something worse than before I 
was harsh" (Pestalozzi, 1807:426) — to illustrate their longitudi- 
nal perspective on individual standards for evaluation. 



-29- 



It l8 unclear to what extent teachers enploy individually-refer- 
enced Btandards in evaluating student periormance, though such stan- 
dards do play a role in evaluation processes in classrooms. Rudman 
et al. (1980) found that 77% of the teachers in the study by Beck 
and Stetz (1979) favored using standardized test data to loeasure 
student growth. Natriello and McPartland (1987) report that about 
three-fourths of the teachers in their national sample rated self- 
referenced standards as "very important" or "extremely important" in 
arriving at final grades. Rheinberg (1983) found that teachers who 
did report a preference for self-referenced standards tended to 
individualize classroom tasks and to view student achievement as 
flexible and pi:esent-oriented. Finalxy, a number of investigators 
have developed programs to establish individually-referenced evalua- 
tion processes in classrooms (Hansen, 1977; Ready, Slavin, and Fen- 
nessey, 1981) # The preponderance of work on individually-referenced 
standards suggests that they are particularly appropriate to the 
purposes of motivation and direction noted earlier. 

CQii#ctin9 Information qh studcnt Performance 

In a rationally ordered evaluation process, once decisions have 
been made about the purposes of evaluation, the tasks, the criteria, 
and standards, an evaluator would be in a good position to consider 
the appropriate strategy for collected information on student per- 
formance. The collection of such information requires a sampling 
process because it would be impractical if not impossible to collect 
total information on student performance. Most of the important 



-30- 



dtcisions about the collection of performance information thus 
involve sampling decisions to insure that the information collected 
provides a valid and reliable estimate of performance appropriate to 
the purposes, tasks, criteria, and standards that have been already 
determined. Of course, in many instances the evaluation process is 
not rationally ordered and decisions on the collection of informa- 
tion on student performance seem poorly articulated with the pur- 
poses, tasks, criteria, and standards* 

The dominant technique for collecting information on strident per- 
formance is some form of testing. This is true at the federal and 
state levels, where formal assessment programs have proliferated in 
recent years; at the district level, where school administrators and 
local boards of education have become increasingly concerned with 
the performance of the system; and only slightly less true at the 
classroom level, where teachers rely on their own tests for a number 
of reasons (Herman and Dorr-Bremme, 1984; Rudman, et al., 1980). 

A number of analysts have contributed important observations 

about the relationship between testing practices and the purposes, 

tasks, criteria, and standards for the evaluation of students. 

Deutsch (1979) argues that the structure of most testing situations 

is dictated by the prevailing purpose of evaluation (selection) and 

the types of standards utilized (norm-referenced). He notes that: 

The social context of most educational measurement is that of a 
contest in which students are measured primarily in comparison 
with one another rather than in terms of objective accomplish- 
mert. If educational measurement is not mainly in the form of 
a < )ntest, why are students often asked to reveal their know- 
leoge and skills in carefully regulated test situations 
designed to be as uniform as possible in time, atmosphere, and 



-31- 



conditions for all students. Individuals vary enormously In 
terms of the amount of time they need and the kind of atmo- 
sphere and circumstances that facilitate or hinder the expres- 
sion of their knowledge and skills; it is only the comparison 
of students with one another that requires measures of educa- 
tional achievement to take the form of contests (Deutsch. 
1979:394) . 

Deutsch goes on to describe the damaging effects of norm-refer- 
enced standards for individual students and advocates an evaluation 
system that would provide individualized, particularistic feedback 
to students to foster their development. Thus, his objection to the 
typical testing situation is rooted in a rejection of the selective 
purpose and the norm-referenced standards that characterize much 
evaluation in schools and classrooms in favor of individually-refer- 
enced standards that might contribute to student motivation. 

Others have also rejected testing strategies rooted in norm-re- 
ferenced standards while advocating a criterion-referenced approach. 
These discussions typically object to the selection of items for 
standardize^ tests, which is a sampling strategy in itself. Popham 
and Husek (1969) point out that the appropriate strategy for sam- 
pling items for tests when the standards are norm-referenced is to 
select items which maximize the variability of performance among the 
individuals taking the test. Hambleton, et al. (1978) observe that 
criterion-referenced tests are not constructed to maximize the vari- 
ability of test scores, so the resulting distributions will tend to 
be homogeneous. They go on to note that norm-referenced tests are 
sometimes used to make criterion-referenced measurements and criter- 
ion-referenced tests are sometimes used to make norm-referenced roea- 



-32- 

.»0 



surements, but that neither strategy is particularly satisfactory. 
Both these authors and others (Glaser, 1963) base their arguenents 
for criterion-referenced tests on the inappropriateness of using 
norm- referenced tests for purposes of certification. 

The purposes for which tests are used in schools and classrooms 
have recently been examined by Herman and Dorr-Bremme (1984) in 
their national survey of administrators and teachers. Table 2, 
adapted from their technical report (Herman and Dorr-Bremme, 
1964:43), presents the percentages of principals reporting that test 
results and other kinds of information are crucial or important for 
particular purposes in the school. 



Insert Table 2 About Here 



We can compare the ratings for the types of formal tests listed 
in the first three columns of the table in terms of two purposes — 
selection and certification — which are included in our '.our cate- 
gory system developed earlier. For assigning students to classes , 
an example of selection, norm-referenced tests are rated as impor- 
tant more often than either minimum competency tests or district 
objectives-based tests. This is true for both elementary and secon- 
dary principals and is consistent with what is generally understood 
to be the best use for norm-referenced tests. For student promotion 
decisions, an example of certification, minimum competency tests are 
more often rated as important for this purpose at the secondary 



ERIC 



-33- 



level (as might be expected) , but norm-referenced tests are seen as 
important more often at the elementary level (though only slightly 
more so than district objectives-based tests) . But a second trend 
overshadows these patterns of responses regarding formal tests. The 
results of teachers' classroom testing are rated as important more 
often than the results of any of the three formal tests, and teach- 
ers' opinions, judgments, and recommendations carry more influence 
than any of the test results. Thus, the source of the information 
(i.e., its generation within the school) appears to be more impor- 
tant than the type of information for influencing decisions. These 
patterns of results are confirmed in teacher responses to questions 
regarding the use of various sources of informaticn for making 
classroom decisions (Herman and Dorr-Bremme, 1984:48-55). 

The relationship between academic tasks, the criteria for defin- 
ing and judging them, and testing have also been the subject of con- 
siderable discussion and inquiry, typically under the rubric of the 
relationship between teaching and testing or integrating instruction 
and assessment. Improving the relationship between what is tested 
and what is taught is a major issue in the improvement of testing in 
U.S. schools (National institute of Education, 1979) . The poor fit 
of tests to the academic tasks assigned to students has concerned 
educators in particular subject areas (such as social studies) which 
are often outside the basic areas where most test development activ- 
ity is concentrated (Rimmington, 1977), as well as researchers, who 
worry that differences in the degree to which tests correspond to 
academic tasks will produce biased evaluations of educational pro- 
grans (Leinhardt and Seewald, 1981) . 

-34- 

: ^.0 



Rudmanf et al. (1980) review a wide range of information on the 
integration of assessment with instruction and find few careful ana- 
lyses of the relationship between the nature of acade'nic tasks in 
classrooms and the content of tests. Leinhardt and Seewald (1981) 
note that analyses of the relationship between teaching and testing 
are expensive and time consuming. They review a number of techni- 
ques for analyzing the correspondence or overlap between teaching 
and testing and conclude thatr although all such analyses are com- 
plexr those involving elementary education in the basic skills are 
somewhat easier to do. This suggests that basic skills testing will 
be the area in which most care will be taken to match testing stra- 
tegies to the nature of academic tasks. 

The practicality of the relatively less complex basi'^ skills 
tests also appears to affect the nature of tasks in schools. In the 
national sample of administrators and teachers in the Herman and 
Dorr-Bremme (1984) study^ respondents in both groups reported that 
increased testing has resulted in more instruction in the basic 
skills. Nearly three-quarters of the principals report that as a 
result of testing programs^ more instructional time is being devoted 
to the basic skil3 subjects of reading/English and mathematics. 
Among teachers r 88% of the elementary teachers r 84% of high school 
English teachers ^ and 74% of high school math teachers reported that 
instruction in the basic skills was consuming a substantially 
greater portion of the school's educational resources. Moreover, 
the impact of testing programs in promoting greater attention to the 
basic skills appears to be greater among schools serving students of 
lower socioeconomic status. 

-35- 



Several recent studies provide some basic descriptive information 
on the use of tests in schools and classrooms. Gullickson (1982) 
surveyed teachers in South Dakota about their testing practices. 
Responses revealed that 89% of elementary teachers and 99% of secon- 
dary teachers relied on some kind of testing, and most tested at 
least weekly (95%) or bjweekly (98%). Although teachers reported 
using a variety of testing techniques, "...only teacher-made objec- 
tive tests played a major evaluative role across all grade levels 
and currlcular areas" (Gullickson, 1982:3). Further, "...teachers 
reported teacher-made objective tests as having the greatest role, 
essay tests as having the second largest role, followed by standard- 
ized objective tests and oral quizzes. Of the four, objective tests 
received much higher ratings than did all of the other three. Essay 
tests received high ratings at the secondary level but very low rat- 
ings at the elementary level" (Gullickson, 1982:4). Despite the 
predominance of objective tests, teachers reported believing that 
essay tests provide a better measure of learning, particularly for 
higher cognitive levels (Gullickson, 1984). Finally, teachers 
agreed that tests should not be the only basis for grading students, 
but about half of the respondent? reported that tests do provide the 
prim*.ry basis for arriving at grades (Gullickson, 1984). 

The conditions of testing reported by teachers in Gullickson 's 
study confirm D jtsch's (1979) observation about uniformity to faci- 
litate comp^risc s among students. Gullickson (1982:8) reported 
that: 

Testing appears to be a formal, constrained situation in which 
students expect to be graded. Virtually all teachers (99%) do 

-36- 

ERIC 



not allow student interaction during the testing process, A 
substantial percentage do not even allow students to ask ques- 
tions of the teacher* in addition students are constrained in 
their use of support material. Seventy-nine percent of the 
teachers do not allow students to use their textbook, notes r 
etc»r in completing a test. 



Despite the controlled conditions under which teachers administer 

tester Gullickson*s (1982) analysis raises a number of troubling 

questions regarding teachers' testing practices: 

First r in the preparation of tests r short answer and matching 
items are the most popular items of choice. Both types tend to 
be limited to lower cognitive level, i.e.c knowledge level, 
assessment (Hopkins and Stanley, 1981). Thus.tests probably 
assess only lower cognitive level understandings. Second, 
while the large majority of teachers reuse items, few teachers 
take the time or make the effort to systematically improve 
their it ems • This is suggested by the minimal amount of time 
given to test analysis (barely enough to score and grade tests) 
and by the minimal use of test statistics. As a direct result, 
test item improvement must be done on a very ad hoc and subjec-* 
tive basis. Third, teachers appear to misuse criterion-refer- 
enced tests. On the surface teachers* advocacy of criterion- 
referenced testing would indicate evidence of a firm 
criterion-referenced testing foundation. However, even if 
teachers clearly define their test domain — c* topic not 
addressed in this survey — they clearly do not address quality 
of items in a manner which would insure their items function as 
desired. Most reuse their items but without careful item ana- 
lysis. Thus, criteria established by teachers ari both artifi- 
cial and subjective. For without knowing how items function, 
it is not possible to accurately set criterion levels for stu- 
dent performance (Guxlickson, 1982:13-14). 



Herman and Door-Bremme's (1984) national survey of administrators 
and teachers also previa es insight into the basic test use patterns 
of teachers. Survey responses indicated that elementary students 
spend about four oercent c . the average instructional time devoted 
to reading and about seven percent of the average instructional time 
devoted to mathematics taking tests. These elementary students take 
a reading test and a math test about once every eight d?ys. About 



-37- 



ERLC 



half of this time is spent on tests mandated by the district or the 
state. Secondary school students appear to spend more time taking 
tests. A typical tenth grade student spends about 13% of the aver- 
age Instructional time in English completing tests and about 12% of 
the average Instructional time in mathematics completing tests. 
These high school students take an English test and a math test 
every three-to four days. About one-fourth of this time is devoted 
to tests mandated by the district or the state. 



As noted earlier, both teachers and adminstrators see teacher- 
made tests as more important sources of information than district 
and state-mandated tests for making a variety of decisions in 
schools and classrooms. In view of the importance accorded teacher- 
made tests, Herran and Door-Bremme (1984) review some of the same 
concerns about the quality of teacher-mude tests raised by Gulllck- 
son (1982). They write that: 

Recent research also indicates tiHit teachers remain poorly pre- 
pared in assessment (Rudr.an, et al., 1980; Woellner, 1979; Yeh, 
et al., 1981). And as Coii's survey indicates, in-service 
training does iic-le to fill the gap. Only about one-fifth of 
the teachers reapondxng received ptaff development related to 
selection and construction oi good tests or in the use of test 
results to Improve instruction. . .In a recent review of teacher- 
made tests, Fleming and Chambers (1983) found that teachers 
write more questions of tha short answer kind than of any other 
type; they rarely devise essay examinations. Por the roost 
part, too, the tests reviewed required students to recall facts 
and terms. Questions requiring learners to translate, apply, 
or otherwise use knowledge were rare. Furthermore, Fleming and 
Chambers discovered a "gener?»l tendency* to omit test direc- 
tions, to use illegible tes* copies, and "to omit the point 
values to be assigned to test questions." Herman and Door- 
Bremme (1984:144). 



-38- 



These reservations about the quality of teacher made tests are 
consistent with the results of Natriello*s (1982) interview study of 
teachers in four high schools. The interviews revealed that teach- 
ers varied greatly in their approaches to testing and evaluation, 
and many teachers lacked a well articulated approach to the evalua- 
tion of student performance in the classroom. 

Although most research and commentary on the collection of infor- 
mation on student performance has centered on testing r alternative 
collection methods have been discussed and are used by teachers. 
Gaston (1976) observes that student behavior under testing condi- 
tions often fails to reflect tasks in the affective domain. He sug- 
gests alternatives to collect information about student attitudes 
and behavior r such as monitoring of students* unassigned reading in 
the library and listening to student conversations as students leave 
the classroom. Heller (1978) suggests alternatives to standardized 
reading tests such . the use of reading materials from popular 
magazlnesr fables, and poems. Solo (1977) explains how alternatives 
such as anecdotal records and collections of students' daily work 
may be used to provide insight into student performance. Herman and 
Door*Bremme (1984) note a variety of techniques used by teachers to 
collect information on student performance, including routine class 
and homework assignments r classroom interaction during question and 
answer sessions, recitations, discussions, oral reading, problem- 
solving at the chalkboard, special projects, presentations, and 
reports. 



-39- 



*0 



The national survey by Herman and Door-Bremme (1984) revealed 
that the teacher's ov?n observations and classwork are more important 
than any type of testing for providing information for classroom 
decision makingr and that teachers' opinions, judgments, and recom- 
mendations are more important than any type of testing in school 
decision making. Although such practices appear to broaden the base 
of information on student performance, there are serious questions 
above quality. Reviewing the literature on teachers' collection of 
information on student performance other than that supplied by 
tests, Rudman, et al. (1980:58) conclude that: 

Teachers' perceptions of students' behavior is stable and not 
Buch influenced by data when the new information seems to con- 
tradict what they have observed (Pedulla, Airasian, Madaus, and 
Kellaghan, 1977; Morine-Dershimer , 1979; Sorotzkin, Fleming, 
and Anttonen, 1974; Beggs, Mayer and Lewis, 1972)... In con- 
trast to teachers' perceptions of their students' test scores 
there is some evidence that teachers' reporting of their stu- 
dents' classroom interpersonal behavior is neither stable nor 
accurate (Elmore and Beggs, 1972; Barnhard, Zimbardo, and Sara- 
son, 1968; Openshaw, 1967; Feshbach, 1969; Tolor, Scarpetti, 
and Lane, 1967) . 

Teachers seem not to be accurate observers of pupils' acadenic 
behavior. Several examples in the literature illustrate teach- 
ers' observations of oral reading by their pupils. Regardless 
of the amount of training or experience, teachers appear to 
make poor judges ol the oral reading behaviors of students. 
(Ladd, 1961; Page and Carlson, 1973; Allington, 1978). 

Thus, there is no shortage of serious questions about the use of 
tests and alternative methods for collecting information on the aca- 
demic performance of students. 



APPgft^ainq student Performance 



Appraising performance in a well developed evaluation system 
involves comparing the information collected on student performance 



-40- 



on assigned tasks with the criteria and standards previously estab- 
lished for those tasks. But even in a well articulateo evaluation 
system, evaluators are expected to exercise judgment and discretion. 
As Dornbusch and Scott (1975) observe: 

The application of standards in specific situations is rarely a 
simple or straightforward procedure. Jt requires judgment with 
respect to the comparability of the performance situation and 
the situations for which the standards are considered applica- 
ble. Similar kinds of judgments are required in employing the 
specified property weights in combining scores to produce a 
performance evaluation. In short, appraisal is seldom a 
mechanical procedure. Moreover, task appraisal entails decid- 
ing how to interpret a low or high performance score. Accu- 
rately appraising a task performance requires knowledge of 
extenuating circumstances, whether it be the inexperience of 
the task performer, the lack of facilities, or assistance 
received from a more skilled co-worker. Such information is of 
critical importance in determining what, if any, message is to 
be communicated to the performer concerning the quality of his 
or her task performance (Dornbusch and Scott, 1975:143). 



For some reason, the exercise of discretion that is expected of 
most evaluators is typically not expected of teachers by researchers 
who study the appraisal process. Indeed, the assumption has been 
that teacher appraisals which vary from the results of standardized 
tests of student performance are somehow flawed. Much of the liter- 
ature on the appraisal process :n the evaluation of student perfor- 
mance has focused on deviations of the appraisals from results of 
standardized tests. Such deviations are often characterized as 
teacher bias. The same perspective has been advanced by others to 
criticize standardized tests themselves, despite evidence that the 
major tests are not biased (Arnold, 1983). 



Studies of teacher bias in appraising student performance have 
examined the effects of student characteristics on teacher apprais- 



-41- 



? 



:RIC 



als of performance. After an extensive review of this literature, 
Natriello and Dornbusch (1984) concluded that four major problems 
with these studies limit the quality of the conclusions that might 
be drawn from them. First, although the literature suggests that 
certain groups of youngsters are more likely to be impeded academi- 
cally by unsound teacher appraisals, the connection between teacher 
behaviors or attitudes and student achievement Is assumed rather 
than documented. Second, these studies have included the currently 
popular student characteristic oi characteristics; few studies have 
developed a theoretical or logical rationale for including a parti- 
cular set of characteristics. Thus they provide little insight into 
the processes by which student characteristics affect teacher 
appraisals or the relative effects of these characteristics. Third, 
the varying conditions under which the studies have been conducted 
and the failure to specify the scope of the studies make it diffi- 
cult to accumulate knowledge on the conditions under which such 
findings are likely to apply. Fourth, most studies of the influence 
of student characteristics on teacher appraisals have not considered 
differences in immediate student performance and behavior in the 
classroom. Thus it has not been possible to determine if reported 
differences in teacher appraisals are the result of differences in 
student characteristics or in actual student performance. 

Bgan and \rcher (1985) observe that the decision to examine 
teacher appraisals of students using experimental models of prejud- 
ice borrowed from social psychology (e.g., Rosenthal and Jacobson, 
1968) is in contrast to the study ot diagnosis in other professions 



ERIC 



-42- 



where accuracy and rationality of the appraisal are assumed and 
interest is directed to the strategy of the appraisal process. Bgan 
and Archer (1985) compare teacher appraisals of student ability in 
mathematics and Fing.lish with appraisals inferred from standardised 
tests* They conclude that ''•••there is little basis for a claim 
that teachers' ratings are inaccurate — not because their ratings can 
be shown to be accurate r by reference to some predetermined measure 
of true abilityr but because we cannot produce a rational strategy 
of classification that ^-^ similar to theirs and that gives substan- 
tially better results" (Egan and Archer, 1985:32) • 

Egan and Archer (1985) see little justification for continuing to 
study teacher ratings of students as a type of irrational cognition^ 
Instead, they suggest research that focuses on the rational aspects 
of teachers* ratings • For example, in their own study, they cA>serve 
that teachers were reluctant to use extreme categories and they 
overused the upper quintiles^ Egan and Archer suggest that such 
patterns might be interpreted in terms of the cognitive psychology 
of .eacher appraisals* 

Other studies provide additional examples of how the rational 
appraisal processes of teachers might be examined • Elmore and Beggs 
(1972) found that teachers tend to rate students on the most recent 
incident that reflected a specific behavior rather than on more glo- 
bal behaviors^ Natriello and Dornbusch (1983) found that teachers* 
rating.t of students reflected particular classroom behaviors and 
performance as opposed to general performance and behavior histories 



-43- 



4 



ERIC 



and student status characteristics. Ryan and Levine (1981) studied 
the impact of sequences of students* past performances on teacher 
appraisals and found that although the final performance was an 
important determinant of evaluators* ratings, a simple recency model 
did not adequately account for all of the data — prior performance 
also influenced the appraisal. 

Teachers also appear to make important discriminations regarding 
the quality and nature of the information they use in formulating 
appraisals. Borko and Shavelson (1978) found that teacher attribu- 
tions to student ability were influenced by the reliability of the 
information they had available for assessment. Levin, Imms, and 
Vilmain (1980) found that college students placed in the teacher 
role in a series of experiments placed less weight on a source of 
information seen to be less reliable, but that they did not use the 
relative variability of scores as an indicator of reliability. 
Pedulla, Airasian, and Madaus (1980) found that teachers could not 
separate their judgments about academically related student behav- 
iors observed on a duily basis from their judgments about students' 
standing on IQ. , mathematics, and English, but that teachers could 
disentangle social behaviors from academically related behaviors. 

Studies of the appraisal of student performance might seek to 
interpret the observed problems in terms of the earlier stages of 
the evaluation process. For example. Brown (1971) attributes much 
of the unreliability of teacher appraisals to the fact that teachers 
use quite different criteria in e/aluating students. Starch and 



ERIC 



Elliott (1912) relate differences in teacher appraisals to differ- 
ences in school and teacher standards. Stockhardr Langr and Wood 
(1985) found differences in the extent to which student background 
factors influenced evaluations in English and mathematics, thus sug- 
gesting the importance of further study of the role of tasks in the 
evaluation process. Geisinger and Rabinowitz (1980) found relation- 
ships between the type of standards and the sampling method employed 
by college instructors and the average course grades. Higher grades 
were given by Instructors who adopted criterion-referenced standards 
(r • .08) or self-referenced standards (r ■ .28) , while lower grades 
were given by those who adopted norm-referenced standards (r - 
-.29). Higher grades were given by instructors who sampled student 
performance through classroom participation (r = .27) , term papers 
and book reports (r- .25), and special projects (r ■ .37), while 
lower grades were given by instructors who sampled studei.t perfor- 
mance through examinations and quizzes (r - -.15). Webster and Ent- 
wisle (1976), drawing on expectations-states theory (Berger, Cohen, 
and Selditch, 1972), develop a theoretical perspective to organize 
and understand the processes by which appraisals are affected by 
factors other than objective criteria (e.g., halo and demon effects, 
Gibb, 1983; Symonds, 1925). Studies of this type provide models of 
the kind of research that will advance thinking about the appraisal 
process in the context of an appreciation of the broader evaluation 
process. 



Providing Feedback qh Student Performance 



An appraisal of a student performance may need zo be communicated 
to various audiences, depending upon the purpose of the evaluation 
process. Such audiences may include the student, parents, school 
officials, and potential employers (Ahmann and Clock, 1967). The 
nature and extent of communications regarding student performance 
have been the subjects of various investigations and commentaries • 

Much of the discussion of feedback on student performance focuses 
on the visible trappings of traditional evaluation systems in 
schools and classrooms — grades and report cards. Jarrett (i963) 
reviews trends in report cards and notes the movement from reporting 
based on a percentage system to reporting on the basis of letter 
grades in secondary schools. He notes the trend in the sixties of 
moving away from grades toward other methods of reporting. Jarrett 
(1963) reports on a survey of 258 secondary schools in which it was 
found that 81% used letters or other symbols, 26% used percentages, 
9% used class ranks, 3% used percentile ranks, 2% used written 
records or logs of student progress, 1% used accomplishment quo- 
flents, and 1% used sigma scores • He sr marizes then current trends 
as: 

(1) less frequent reporting for all pupils; (2) more frequent 
reporting in cases of exceptionally good or exceptionally poor 
performances; (3) ratings on many more traits and abilities 
than formerly; (4) making the reports more and more descrip- 
tive; and (5) reporting for the purpose of furthering pupil 
growth (Jarrett, 1963:46). 



-46- 



52 



ERIC 



Chansky (1975) reports on a more recent study of report cards in 
two percent of school districts nationwide. His analysis considered 
four major features of reporting: the opening comments, the aca- 
demic items noted, the personal qualities noted, and the rating sys- 
tems employed. He found that the use of statements of the purposes 
of the evaluation declined from the primary grades through high 
school, the number of academic items marked declined from a high in 
the primary grades to a low in high school, socio-emotional traits 
tended to emphasize growth in the lower grades and deviance in the 
higher grades, and a variety of rating systems were used. In addi- 
tion, Chansky (1975) classified the rating systems both in terns of 
the number of categories used and in terms of the content of the 
category systems. Table 3 presents both the number of rating cate- 
gories and the content of the categories for the schools in Chan- 
sky's survey. 



Insert Table 3 About Here 



The patterns of responses indicate that the higher the grade level, 
the more rating categories likely to be used and the greater the 
variety of reporting systems. 

A number of commentators have suggested alternatives to tradi- 
tional grading and reporting. Rudman (1978) suggests reporting dev- 
ices such as checklists that are more closely related to the mechan- 
isms for recording student performance. Ediger (1975) suggests more 



ERIC 



frequent and more varied mechanisms for reporting student perfor* 
manctt, such as telephone and face-to-face conferences with parents. 
Giannangelo and Lee (1974) and Giannangelo (1975) describe a system 
of computer-assisted reporting that provides more anecdotal informa- 
tion on student performance. Holtz (1976) presents a reporting 
method for student performance in element ry science more clearly 
related to evaluation criteria. Walling (1975) discusses five broad 
cati^gories of reporting techniques — traditional grades, percentage 
ratings, checklists of objectives, narrative evaluations, and con- 
ferences. Stewart (1975) deBcribes a multi-dimensional reporting 
system for use in elementary schools. 

Gullickson (1982) reports on the processes used by teachers to 
provide feedback on tests to students. Most of the teachers in this 
study provided a grade rather than just a numerical score on tests. 
In addition, 90% of the teachers reported providing written comments 
at least occasionally and 55% of the teachers reported providing 
written comments usually or always. These teachers attempted to 
provide feedback in a timely manner — 7% returned tests the same 
di^y, 83% returned tests within one day, and only 6% required more 
than two days to process tests and return them. Gullickson (1982) 
also asked teachers to classify their feedback activities. The 
average teacher in his study spent 20 minutes in class review of a 
test and averaged nine minutes reviewing items selected by the 
teacher, eight minutes reviewing items questioned by students, and 
three minutes reviewing the grading procedures. Keep in mind that 
the teachers in Gullickson 's study tended to rely on short answer 



-48- 

r.4 



and objective tests, in a study of the types of feedback used in 
classroonsr Zahorick (1968) found that teachers relied upon a lim- 
ited number of techniques for reviewing test ite.its, and very few 
teachers indicated why a particular response had merit. 

Natriello's (1982) interview study with teachers in four high 
schools revealed a wide range ot .ctivities resigned to provide 
feedback to students. Although most of tne teachers used tradi- 
tional methods to provide feedback (e.g., written comments, confer- 
epceSr etc.), some had developed innovative techniques. An English 
teacher provided audio cassette tapes o'. comments on student papers, 
and a -.i^yslcal education teacher kept an "open gradebook* that stu- 
dents could examine at any time. Other teachers had students tally 
tueir own cumulative scores at various points in the grading period, 
and still others had students chart their own progress on a regular 
basis. 

A number of observers have ronarked on the relationship of the 
feedback process to other aspects of the evaluation syscem. Slav in 
(1978:98) notes that "Feedback is a complex issue, as i: has diffe- 
rent meanings and uses depending on the way in which it is used 3* 
Re distinguishes among three kinds of communications regarding stu- 
dent performance as they relate to three purposes of evaluation: 
^informational feedback," which should tell students where they 
strnd compared to other students and thus should be based on norm- 
referenced standards; "performance feedback," which should provide 
students with information on their day-to-day performance and pro- 



-49- 



ERIC 



vide direction for improvement and thus should be based on criter*- 
ion-ref t - enced standards; and incentive feedback, which should 
enhance student motivation and thus should be both timely and based 
on tasks that are neither too difficult nor too easy. Slavin's 
three types of feedback correspond to the selection, direction, and 
motivation purposes of evaluation systems noted earlier. Slavin 
suggests that a single system of evaluation oannot serve all three 
functions and urges the creation of parallel evaluation systems. 

LiSEman and Paetziod (1983) also noted the heterogeneous nature 
of feedback on achievement as it relates to the purposes of evalua- 
tion. They distinguished between informative feedback and motiva- 
tional feedback. Hansen (1977) proposed a system of personalized 
feedback on achievement that is consistent with the directive pur- 
poses of evaluation. Cross and Cross (1980) suggested that teachers 
who devote more tim<i to writing evaluative comments believe that 
such feedback will facilitate student motivation. 

Relationships between feedback and other aspects of the evalua- 
tion process have also been noted. Lintner and Ducette (1974) noted 
the impact of task variables, particularly task ambiguity, on stu- 
dent responsiveness to praise. Lissman and Paetzoid (1983) observed 
that certain kinds of feedback seemed more o'rfective for certain 
kinds of tasks. Oren (1983x307) noted the relationship between 
"rich, more specific, and individualized" feedback and the motiva- 
tional purposes of evaluation, specifically, the attributional ten- 
dencies of students. 




-50- 



Zhfi EffftCta fi£ Evaluafclon Proceaaefi ^ Studenta 

Although the evaluations that take place in schools and class- 
rooms clearly have powerful effects on students and others (e.g., 
see Pooler 1979) , consideration of studies of these effects has been 
deferred until now for several reasona. 

First r relatively little descriptive infonaation on evaluation 
processes in schools and classrooms has been considered in designing 
effects studies, even though many studies seek to create new know- 
ledge as the basis for improved practice. Thus, the descriptive 
information on evaluation in schools and classrooms reviewed above 
provided importaat groundwork for consideration of the effects stu- 
dies. For example, some studies seek to develop alternatives to 
norm- referenced standards, but descriptive accounts suggest that 
such standards may not now be used extensively by teachers. 

Second, most of the effects studies concentrate on only one or 
two aspects of the evaluation process outlined above, and thus fail 
to consider the impact of other key elements. The conclusions drawn 
from such studies should be approached with caution. For example, 
few studies consider the nature of the assigned tasks upon which 
students are being evaluated, yet it is clear that task differences 
condition the impact of evaluation processes. 

Third, few of the effects studies consid:.r the multiple purposes 
of evaluations in schools and classrooms. They often compare the 
Impact of different evaluation methods on some outcome that has 



-5?- 



nothing to do with the purpose for which one of the methods was 
developed. For example, a study demonstrating that differentiated 
feedback contributes more to directing future student performance 
than a single letter grade may be simply showing tliat an evaluation 
system created for the purpose of providing direction to students 
does a better job of providing that direction than another evalua- 
tion system created for the purpose of selecting students. 

With the above reservations clearly in mind, it is useful to 
review the effects of some selected aspects of evaluation processes. 

Investigators are only beginning to recognize the importance of 
classroom tasks in understanding educational and evaluation pro- 
cesses (Doyle, 1983) . A particularly interesting line of research 
in this area focuses on the impact of the task structure of class- 
rooms on students' conceptions of the distribution of ability in the 
class. In a study of fifth- and sixth-graders, Rosenholtz and Wil- 
son (1980) found that in clashes characterized by what they called 
higher "resolution" (i.e. less task differentiation, more ability 
grouping, more evaluations comparing the work of on' student with 
another, and less student autonomy to choose tasks) there was higher 
concurrence among classmates, between self and classmates, between 
teacher and classmates, and between self and teacher in ratings of 
reading ability. Rosenholtz and Rosenholtz (1981) found that these 
same high "resolution" classroom structures led to more dispersed 
evaluations of reading ability by students themselves, by class- 
nateSf and by teachers. They found that low "resolution" 



-52- 



fr 7 



(dinensional) classroom structures diminished the effect of evalua- 
tions the teacher on peer evaluations of an individual's reading 
ability. 

In a study of third grade classrooms Simpson (1981:127) found 
that low levels of curricular differentiation (one element of unidi- 
mensional classroom structure) led to "...a more nearly normal dis- 
tribution of self-reports of ability by increasing the proportion of 
students reporting ability levels below average and far below aver- 
age." Moreover, low curricular differentiation also appeared to lead 
to a more generalized view of academic ability r greater peer consen- 
sus about students' performance levels, and to greater influence of 
peers on individual's self-reported ability. These studies suggest 
that the consistency or differentiation of task assignments, crite- 
ria, standards, sampling strategies, and feedback mechanisms may 
affect the perceived distribution of ability. 

Dornbusch and Scott (1975) make the point that criteria add to 
the definition of the assigned task and direct the attention of per- 
formers to the key elements of the task for which they will be held 
accountable. Schunk (1983) reports on a study in which some chil- 
dren were offered rewards for participating in a task, others were 
offered rewards for careful work on the task, and still others were 
not offered rewards until they had completed the task. The results 
indicated that the first group of children, who had received both a 
task assignment and information on the criteria for performance, 
8how«d the highest levels of skill, self-efficacy, and rapid problem 
solving. 



-53- 



This should not be surprising. As Deutsch (1979:396^ points out, 
"students are in a bewildering position if a teacher marks then 
without telling them in sufficient detail the values, rules, and 
procedures employed in his or her grading. In such a situation, the 
mark-oriented students are necessarily anxiously dependent on the 
teacher's approval, since they have no other basis for guiding their 
behavior to achieve merit... where the instructor is explicit in 
his or her style of grading, the student can be more independent of 
the teacher." 

Natriello (1982) found that more than 30% of the students in his 
study of four suburban high schools reported that they had received 
unsatisfactory evaluations because they had misunderstood the crite- 
ria by which they were to be evaluated. Smith (1984) observed that 
clarity has been demonstrated to be an important component of teach- 
ing in research on teaching effectiveness (Rocenshine and Furst, 
1971) . Smith studied the impact of teacher "use of uncertainty 
phrases" on student achievement and found that such phrases nega- 
tively affected achievement. 

However, explicitness may have undesirable effects as well. 
Deutsch (1979) notes that explicit evaluation systems may lead 
mark-oriented students to limit their work to what is being assessed 
by the procedures employed in the grading or to attempt to outwit 
the procedures. He cites as an example managers in the Bell System 
who are graded or evaluated by "profit indices" and who often outwit 
the system by postponing routine maintenance -ists, which results in 



ERIC 



-54- 



equipnent breakdowns several years later when successful managers 
have noved on to new positions. Deutsch (1979) concludes that euch 
dilennas are avoidable only to the extent that the evaluation system 
fosters the motivation to achieve intrinsic merit rather than its 
external symbols. 

The effects of performance standards seem to be more complex than 
is typically thought. Investigations have focused on both the level 
of standards and the type of standards used in evaluation systems. 
Early studies of the impact of school standards on student perfor- 
mance (Brookover and Schneider, 1975) seen to have survived the 
challenge that the correlation between teacher standards and student 
performance could result fro.T» the impact of the latter on the former 
(Crane and Mellonr 1978) . Findings from the school effectiveness 
literature (Purkey and Smith, 1983) , the teacher expectations liter- 
ature (Brophy and Evertson, 1981) ani the task goals literature 
(Locker 1968; Rosswork, 1977) suggest that higher standards yield 
better student performance. In studies specifically focused on 
evaluation processes, Natriello and Dornbusch (1984) found that 
higher standards led to greater student effort on school tasks and 
to students being more likely to attend class, and Natriello and 
NcDill (1986) found that when teachers had standards for homework , 
students were more likely to spend time on homework. 

However r the effects of higher standards may not be uniformly 
positive. Natriello (1982) found that students who perceived stan- 
dards for their performance as unattainable were more likely to 



-55- 



become disengaged from high school. McDill, Natriello, and Pallas 
(1965) suggested that higher standards nay actually have detrinental 
effects for at-risk students in secondary schools. There seems to 
be a curvilinear relationship between the level of standards and 
student effort and performance. The goal would seem to be to chal- 
lenge students without frustrating them (Atkinson, 1958). 

The impact of different types of standards has also been investi- 
gated. Perhaps the most attention has been devoted to norm-refer- 
enced standards or "grading on the curve." Michaels (1977) desig- 
nates the reward structure associated with this practice as 
"individual competition, in which grades are assigned to students 
based on their performances relative to those of their classmates* 
and distinguishes it from "individual reward contingencies, in which 
grades are assigned to students on the basis of how much material 
each student apparently masters." He considers the effects of these 
two reward structures along with two other reward structures (group 
comp«. ,cion and group reward contingencies) on student academic per- 
formance. Reviewing the relevant literature, he concludes that 
individual competition consistently produces superior academic per- 
formance. However, he observes that the superior academic perfor- 
mance found to be associated with individual competition may be lim- 
ited to the top third of the class, to those students who are most 
responsive to the reward structure, for several reasons t First, the 
value of giades may vary considerably across students; second, the 
probability of receiving high grades also varies considerably across 
studeiitsj third, performance gains by initially low-performing stu- 



ERIC 



-56- 

62 



dents may be seldom reinforced in systems of individual competition. 
Michaels concludes by arguing that the reward structure itself may 
be lers important than seeing to it that the rewards selected are 
valued by all students, are made contingent on the performance to be 
strengthened, and that significant performance gains are intermit- 
tently reinforced. 

Deutsch (1979) criticizes individual competition or grading on 
the curve as an artificially created shortgage of good grades. He 
argues that the "Disappointing rewards, induced by an artificial 
scarcity, are likely to hamper the development of educational merit 
and the sense of one's own value." (Deutsch, 1979:394). Moreover, 
under individual competition, ". tudents are more anxious, they think 
less well of themselves and of their work they have less favorable 
attitudes toward their classmates and less friendly relations with 
them, and they feel less ■>f a sense of responsibility toward them." 
(Deutuch, 1979:399) 

Examining the same studies as Michaels, Duetsch (1979:398) con- 
cludes that a number of these studies were flawed because they did 
not equate the objective probability of reward in the reward struc- 
tures being compared. Deutsch 's reanalysis of these studies shows 
■no systematic differences in performance on isolat .d work under 
several different reward systems." Williams, Pollack, and Ferguson 
(1975) also found no significant differences between the achievement 
and self -reported attitudes or school- related behavior of students 
exposed to norm-referenced and criterion-referenced standards. They 



-57- 



also found that criterion-referenced standards enabled some students 
who performed poorly initially to increase their performance on 
later tests, but students who did well initially began to work less 
hard than students working under a norm-ref eienced system who had to 
deal with the possibility that other students would try harder on 
the next test and raise the curve. 

Deutsch < 1979: 394) also argues that the competitive struggle for 
scare goods in the classroom teaches students about more than just 
their own performance. He notes that they "...are socialized into 
believing that this is not only the just way but also the natural 
and inevitable way of allocating jcarce values in the larger, imper- 
sonal, nonfamiliar world. They also learn that there are winners 
and losers in such competitions and that, although it is possible 
for them to win, they are more likely to lose." 

Finally, Deutsch (1979) points out that the artificially induced 
scarcity of grades lends them importance. In fact, it is one of the 
chief neans of conveying meaning to grades, which themselves are 
typically of uncertain quality and unspecific meaning. 

Norm-referenced standards have also been compared to individual- 
ly-referenced standards for their effects on student performance. 
Beady, Slavin, and Fennessey (1981) found no differences in the 
effects of norm-referenced standards and individually-referenced 
standards among students participating in a program of focused 
instruction, a particular model of direct instruction. On the other 
hand, under different task conditions Rheinberg (1983) found that 



ERIC 



-58- 



students working under individually-referenced standards showed more 
realistic strategies of goal setting, ritore often attributed their 
successes to their own effort, and performed better than students 
working under norm-referenced standards. 

Boloco^sky and Mescher (1984) added complications to the issue ot 
the impact of different standards by considering the effects of 
different standards for students who differ in self-esteem and locus 
of control. They found that students with different characteristics 
performed differently under different kinds of standards. Self-re- 
ferenced standards worked best with students with low self-esteem 
and internal locus of control. Criterion-referenced or absolute 
standards worked best with students with low self-esteem and exter- 
nal locus of control. Norm-referenced standards worked best with 
students with high self-esteem regardless of locus of control. 

Many studies have examined the impact of different types of stan- 
dards on student cooperation and competition. These studies typi- 
cally examine the relationships between the evaluations made and 
rewards distribute-j and the tendency for students to perform tacKS 
independently, cooperatively, or competively. Slavin (1977:63^) in 
a review of much o^ this research uses the term "interpersonal 
reward structure* to refer to the dependence or lack of dependence 
of any given student on any other student. He distinguishes three 
types of interpersonal reward structures: competitive reward struc- 
tures, where the probability cf one student receiving a reward is 
negatively related to the probability of other students receiving a 




-59- 



r>5 



reward) independent reward structures, where the probability cf one 
student receiving a reward is unrelated to the probability of other 
students receiving a reward; and cooperative reward struct:ures, 
where the probability of one student receiving a reward is posi- 
tively related to the probability of other students receiving a 
reward. 

Slavin (1977x644) reviewed the research on the impact of these 
reward structUL'es on b*:udent social behavior and academic perfor- 
mamce in the classroom. He concluded that cooperative structures 
enhance social behavior along a number of dimensions r including 
interpersonal attraction r friendliness, positive group evaluation, 
helpfulness, and cross-racial interaction. Competitive and indepen- 
dent reward structures were found to be more effective in increasing 
performance when tasks required little cooperation or when there was 
little opportunity to share resources to facilitate performance, but 
Slavin noted that cooperative structures should be effective in pro- 
moting performance when such cooperation and sharing are necessary 
and permitted. 

A number of investigations have focused on the frequency of the 
gampi ln9 process, especially the frequency of testing. Reviewers of 
the research on the frequency of testing (Feldhusen, 1964; Peckham 
and Roe, 1977) have found that although early studies of testing 
frequency indicated that more frequent testing had uniformly posi- 
tive effects on student learning and motivation, more recent studies 
incorporating more variables suggest that more frequent testing may 



ERIC 



-60- 



not benefit all students in all contexts. However, considering 
evUuation activities as contests, Deutsch (1979:396) concludes that 
"The existence of many diverse contests diffuses competition and 
reduces 'he negative implications of any particular contest: It is 
less harmful to one's self-esteem and social standing." 

Studies of testing frequency have not typically viewed testing as 
part of a larger evaluation process. In the model developed here, 
however, testing is merely one method of sampling student perfor- 
mance and outcomes. Viewed in this way, the frequency of testing 
issue can be more appropriately stated as one of selecting an appro- 
priate interval to collect samples of student performance on parti- 
cular tasks to be evaluated in terms of particular criteria. Cer- 
tain student tasks may require more extensive and/or more frequent 
sampling procedures to insure that the appraisal process is based on 
valid and reliable samples of student performances and outcomes. 
Objections that frequent evaluation raises student nnxiety must be 
balanced against the preferences of students that the teacher have 
more extensive and more representative samples of their work. Of 
course, overly frequent evaluation may have negative effects on stu- 
dent motivation and performance when it disrupts performance itself. 

Consideration of the appraisal process focuses attention on the 
connection between student performances and the evaluations made of 
those performances by teachers, often from the perspective of the 
teacher attempting to carefully relate performance information to 
predetermined tasks, criteria, and standards. The quality of the 



-61- 



connection between student performance and evaluations also appears 
to have important effects on students. Natriello and Dornbuech 
(1984) found that when students perceived the evaluations of their 
performance on school tasks to be unsound (i.e., not to accurately 
reflect their effort and performance) , they were less likely to con- 
sider these evaluations important and less likely to devote effort 
to the associated tasks. 

But these effects may be more complicated as indicated by work on 
the theory of learned helplessness which suggests that experiencing 
uncontrollable outcomes should depress performance (Abramsonr Selig- 
nan and Teasdale, 1978, Seligman, 1975), and by work which suggests 
that experiencing uncontrollable outcomes facilicates increased per- 
formance by producing an increased need for control (Roth and Boot- 
zin, 19791 Thornton and Jacobs, 1972). An integrative, lEiOdel devel- 
oped by Hortman and Brehm (1975) suggests that brief exposure to 
uncontrollable outcomes will lead to improved performance while 
extended exposure will lead to decreased performance. Research 
involving high school students (Buys and Winefield, 1982) finds only 
decreased student performance in reaction to the experience of 
uncontrollable outcomes, a pattern the authors link to the rela- 
tively less self-reliant and less self-confident nature of high 
school students compared to adults, and to the nature of the school 
environment, which they see as tending to foster helplessness. 

Students may differ in their perceptions of appraisal processes 
independent of the process itself. Evans and Engelberg (1985) found 



-62- 



that older and higher-achieving students understood grading prac- 
tices better than younger and lower-achieving students, and that 
younger and lower-achieving students were more likely to attribute 
grades to external and uncontrollable factors while high achi3ver8 
nnd older students attributed grades to internal and controllable 
factors. 

A number of studies have examined the impact of the feedback pre- 
sented as part of the evaluation process. Stewart and white (1976) 
present the results of their own study and review thobe. of 12 others 
which attempted to replicate Page's (1958) classic study of the 
effects of feedback. Page found that -"when the average secondary 
teacher takes the time and trouble to write comments (believed to be 
"encouraging") on student papers, these apparently have a measurable 
and potent effect upon student effort, or attention, or attitude, or 
whatever it is which causes learning to improve..." (Page, 
1958»180-181) . Stewart and white (1976) reach a slightly less con- 
fident conclusion, noting that the positi*- » effect obtained by Page 
may depend upon the particular learning conditions and the nature of 
the teacher comments. Cross and Cross (1980) found that personal- 
ized encouraging comments from the teacher used in addition to a 
grade on tests and assignments enhanced the "internality" * stu- 
dents in an inner-city junior high school. 

feedback may also affect students in schools and classrooms other 
than those to whom the feedback pertains. A study of third graders 
by Simpson (1981) illustrates how vaiuative feedback decisions can 



ERIC 



-63- 



affect students' perceptions of the ability levels of their 
classmates. Simpson (1981:124) argues that "^Grades are singular 
symbols taking on unidirr.ensional comparative meaning from the 
abstract numerical system which defines them. Frequent grading is 
capable of reducing even relatively complex performances to a single 
dimension f because grades reduce information to numbers r because 
these numbers can be averaged r and because teachers and student 
peers can use these numbers to place students on a single global 
stratification scale." Simpson finds that in classrooms where teach- 
ers report "always" or "usually" grading student work (as opposed to 
those in which they "never" or "seldoir" grade such work) f where they 
reprrt using few kinds of instructional materi ""sr and where they 
seldom use alternative media and seldom allow students to choose 
their tasks, there was greater dispersion among students' reported 
ability levelSr greater generalization of students' reported ability 
levels, greater peer consensus as to students' relative performance 
levels, and greater peer influence over stuoents' reported ability 
levels. Thus, the use of grades seems to lead tr» more pronounced 
and more powerful lity stratification processes in the classroom. 

A similar effect on the distribution of attr ibutional tendencies 
in classrooms was found by Oren (1983) r who explored the effects of 
evaluation feedback on the attr ibutional tendencies of students. 
Results indicated that in classrooms with differentiated, specific, 
a-id individualized feedback, the attributional tendencies of low 
achl«*vers were more like those of high achievers. Specifically, low 
achievers in such classrooms scored higher on internal control than 



ERLC 



-64- 



did low achievers in classrooms with less differentiated feedback 
systems. 

The affective value of feedback has also been shown to affect 
attributions in classrooms. Meyer, Bachinann, Biermann, Hempelmann, 
Ploger, and Spiller (1979) report on a series of six experimental 
studies which investigated the extent to which praise and criticism 
in response to task performance provided information about other's 
perceptions of a focal actor's ability. In these studies subjects 
were presented with descriptions of two students who had obtained 
identical results at a task. One of the students received neutral 
feedback while the other was praised for success or criticized for 
failure, studies using adult subjects revealed that praise after 
success and neutral feedback after failure led to the perception 
that the focal actor's ability was low, and neutral feedback for 
success and criticism after failure led to the perception that the 
focal actor's ability was high. However, these findings varied by 
the age of the respondents. Third-grade students believed that the 
student praised by the teacher was the brighter one; students in 
grades 4 to 7 selected the praised student and the student receiving 
neutral feedback in approximately equal numbers; and students in 
grades 8 and above believed that the si-'ident receiving neutral feed- 
back was brighter than the one receiving positive feedback following 
successful performance. 

Although the effects of feedback in the classroom appear to be 
powerful, they are multidimensional and complex. Simple injunctions 



-65- 



to increase feedback for one purpose or another are likely to set in 
motion a range of processes that need further examination. 

Although the above studies of the effects of aspects of the eval- 
uation process have suggested some possible consequences for certain 
evaluation processes r little attention has been devoted to develop- 
ing an understanding of entire evaluation systems composed of pur- 
poseSr tasksr criteria, standards, samples, appraisals, and feed- 
back. One of the key issues to be examined in thinking about 
systems of evaluation is the relationship between various aspects of 
the process a..d the extent to which there is consistency among them* 
For instance, evaluations and evaluation systems may differ in con- 
sistency between task assignments and criteria set for the task* 
Some teachers may take care that the performance criteria set for a 
tash be appropriate to the nature of the task assignment, while oth- 
ers may not — a teacher may designate a task as a creative opportu- 
nity when an assignment is made but hold students accountable for a 
formulaic set of criteria. A second instance might be the consis- 
tency between the criteria and standards set for the task and the 
process of sampling student performances and outcomes. A teacher 
may specify criteria related to the actual performance of the task 
(e.g. how to proceed to solve a math problem) , but only sample the 
outcome of the performance (e.g. the correctness of the answer)* 

Little research has examined the extent to which teachers imple- 
ment a consistent system of performance evaluation for students* 
Interviews conducted by Natriello (1982) with 80 secondary school 



ERIC 



-66- 

72 



teachers suggest that teachers vary widely in their ability to arti- 
culate a systematic approach to the evaluation of student perfor-- 
mance. Alsor examinations of teacher preparation curricula indicate 
that prospective teachers receive little or no training in the eval- 
uation of student performance (MayOr 1967; Roederr 1973) « The 
effects of this lack of consistency cculd be quite negative. 
Natriello (1982) reported that high school students who experienced 
more inconsistencies in the evaluation system were also more likely 
to become disengaged from school. In that study students were asked 
to report on the extent to which they perceived incompatibilities or 
inconsistencies in the evaluation processes to which they were sub- 
jected. Students who reported being exposed to such incompatibili- 
ties were more likely to report complaining to other students about 
the evaluation and authorif- * system of the school • 

The potential consequences of inconsistencies in systems of eval- 
uation and the likelihood that such inconsistencies are widespread 
make it particularly important to consider evidence on how different 
components of evaluation systems might fit together to produce a 
coherent evaluation process. The best evidence of such systems 
comes from formal programs and policies rather than from studies of 
particular elements of "^valuation processes. 



-67- 



The Impact of Programs and Policies 

Even though major educational programs and policies seldom have 
an explicit focus on evaluation, consideration of programs and poli- 
cies that might affect evaluation processes in schools and class- 
rooms provides a perspective different from those studies of evalua- 
tion processes reviewed thus far. These comprehensive programs 
typically address (at least implicitly) multiple elements of the 
evaluation process as opposed to individual features. 

Most major programs include a rationale which involves some 
statement of purpose. Many programs entail a conception of the 
nature of academic tasks in schools and classrooms, a particular 
type of standard for performance, and guidelines for the type of 
feedback that students should receive. Several major progrtuns can 
be considered for their effects on the evaluation practices of edu- 
cators and ultimately on student learning both to illustrate the 
utility of the model of evaluation processes specified earlier and 
to reveal more about the implications of the progreuns for evaluation 
processes . 

Table 4 presents a sunmiary of the implications of three major 
programs or policies for evaluation processes in schools and class- 
rooms. 



Insert Table 4 About Here 



-63- 



4 -i. 



Minimum competency testing programs are enacted for the purposes 
of certification of students. They tend to involve relatively sim* 
pie tasks with time limits on perforniance and absolute standards. 
Such programs are based on infrequent samples of performance f rely 
on appraisals prepared by individuals other than the immediate 
teacher of the subject, and utilize simple feedback to students. 

Mastery learning programs are iiuplemented for the purpose of pro- 
viding students with direction. They tend to structure the cui 
lum in terms of relatively small discrete tasks with criteria that 
do not involve time limits on performance but do involve absolute 
standards. Mastery learning programs are based on quite frequent 
samples of performance, provide "A's" for all students who master 
the material f ani^ utilize frequent and differentiated feedback to 
orient students co their accomplishments and remaining needs. 

Public Law 94-142 was enacted to require individualized instruc- 
tion and evaluation for handicapped students in !-he least restric- 
tive environment by providing greater direction to such student/i. 
The policy .^.n^plies individualized tasks with non-specified criteria 
and individually referenced standards. Further, p.L. 94-142 envi- 
sions frequent sampling of student performance anu frequent feedback 
to students of the appraisals of their individual teachers. 

This brief analysis of the evaluation implications of these three 
major educational programs policies reveals several advantages of 
the application of the evaluation framework. First, examining pro- 
grams in terms of the elements of the evaluation allows for a clear 



ERLC 



-69- 



specification of the purposes of different programs so that prograas 
with different purposes are not as likely to be examined for effects 
they are not designed to have. For example. Table 4 makes it clear 
that a study comparing a mastery learning program to a minimum corn- 
pet' y testing program could not fairly look for the same effects 
from both programs. 

Second y considering programs and policies in terms of elements of 
the evaluation process allows identification of areas in which the 
programs and policies carry few implications for evaluation systems 
and thus areas where differences in practice may lead to quite 
different outcomes from the same program. For example, none of the 
three programs in Table 4 carry very specific implications for eval- 
uation criteria. As a result individual implementations of these 
programs might vary considerably and produce quite different out- 
comes from what are ostensibly the same type of programs. 

Thirdy examining programs in terms of the elements of the evalua- 
tion process facilitates the identification of conflicts between 
different programs when they are implemented simultaneously. For 
example, in some states teachers are simultaneously subjected to the 
requirements of minimal competency testing programs and P.L. 94-142. 
The former attempts to implement absolute standards; the latter man- 
dates individually referenced standards. Teachers are likely to 
experience considerable conflect trying to comply with both programs 
(i^enderf 1984) . 



-70 

ERIC 



Overall, evaluation framework provides one way to link gen- 
eral educational programs and policies to the specific practices of 
local educators in schools and classrooms. Analyzing newly proposed 
educational programs and policies for their implications for class- 
room evaluation processes should reveal much about the problems of 
implementation as well as about the likely effects of such programs 
on students. 

Conclusions 

Evaluation processes in schools ani classrooms are both complex 
in their organization and powerful in their effects on the lives 
educators and students. This review demonstrates that despite exa- 
mination of elements of evaluation systems by practitioners and 
researchers, a comprehensive and powerful conceptual framework to 
facilitate the study of student evaluation practices and their 
effects has not yet been developed. The framework described here is 
a first step in that direction. Further refinement and elaboration 
may permit more keenly drpwn conclusions about evaluation processes. 
Students, teachers, and administrators will contin«>e to encounter 
the influences of evaluation processes as they work in schools and 
classrooms. Educational researchers will continue to have their 
studies affected by various evaluation practices. The only question 
is whether practitioners or researchers will mount the effort to 
secure greater understanding of and control over the evaluation pro- 
cesses that affect us all. 



-71- 



FOOTNOTES 



<1> Dornbusch and Scott (1975) note that the term "task conceptions* 
represents a compromise between the notion of a task as completely 
objective and the notion of the understanding of a task as com- 
pletely subjective. 

<2> This overly empiricist approach is merely a specific maiiifesta- 
tion of 5 more general phenomenon identified by Lakatos (1971) . 

<3> The idea that there is a danger in allowing the match between 
instructional materials and test items (Linn^ 1983 :1C7) could only 
arise in situations in which there is a failure to develop genuine 
criteria separate from objectives rooted in measurable behavior. 
Such conditions ^^zise because of the overly empiricist approach in 
I lich there is a one-to-one correspondence between criteria and 
indicators. 

<4> The term ••criterion-referenced testing" combines the word "cri- 
terion," the concept of a standard, and a technique of sampling all 
in a single phrase. 



ERIC 



-72- 

75 



REFERENCES 



Abrantson, L.Y., Seliginan, M.E.P., & Teasdale, J.D. (1978). 

Learned helplessness in humans: Critique and reformulation. 
Journal OL Abnormal Paychology, 4^-74. 

Ahmann, J., 6 Clock, M.L. (1967). Evaluating pupil growth: 
Principles of tests and measurements (fourth edition). 
Boston: Allyn and Bacon. 

Airasian, P.W., & Madaus, G.F. (1983). Linking testing and 

instruction: Policy issues. Journal Ql Educational Meagure - 
ffifintr 20., 103-118. 

Allington, R.L. (1978). Teacher ability in recording oral read- 
ing performance. Academic Therapy . 187-192. 

Arrobruster, B.B., Stevens, R.J., & Rosenshine, B. (1977). Ana- 
lysing content coverage and emphasis: A study of three cur- 
ricula and two tests. Technical Report Number 26* Urbana: 
Center for the Study of Reading, University of Illinois. 

Arnold, N. (1983). Statistical models of fairness and their 

impact on non-biased assessment. Diagnostiq ue> 150-158. 

Atkinson, J.W. (1958). Towards experimental analysis of human 
motivation in terms of motives, expectancies, and incen- 
tives. In J.W. Atkinson (ed.). Motives In fantasy , action 
and society , Princeton, NJ: Van Nostrand.. 

Barnhard, J.W., P.G. Zimbardo, & Sarason, S.B. (1968). Teachers* 
ratings of student personality traits that relate to IQ and 
social desirability. Journal of Educational Psychoxog y, 59 , 
128-132. 

Beady, C.J., Slavin, R.E., & Fennessey, G.M. (1981). Alternative 
student evaluation structures and a focused schedule of 
instruction in an inner-city junior high school. Journal flf 
Educational Psychology. 15., 518-523. 

Beady, C, & Slavin, R. (1981). Making success available to all 
students in desegregated schools: An experiment. Paper pre- 
sented at the annual meeting of the American Educational 
Research Association, Los Angeles. 

Beck, M.D», & Stets, P.P. (1979). Teacher opinions of standard- 
ised test use and usefulness. Paper presented at the annual 
meeting of the American Educational Research Association, 
San Fransico, April. 



ERIC 



-73- 



Beggs, D.G., Mayer, R,, & Lewis, E.L. (1972). The effects of 
various .^.mlques of interpreting test results on teacher 
pervoption and pupil achievement. Measurement and Evalua - 
tion In Guidance. 290-297. 

Berger, J., Cohen, B.P., & Zelditch, M. (1972). Status charac- 
teristics and social interaction. American Sociological 
r- :;9ii, ai, 241-255. y^^^. 

Berk, R.A. (1976) . Determination of optional cutting scores in 
criterion-referenced measurement. Journal Experimental 
Education . 4-9. 

Bidwell, C.E. (1965). The school as a formal organization. Pp. 
972-3022 in J.G. March (ed.) , Handbook jjf oraanlzationa . 
Chicago: Rand McNally. 

Bloon, B.S. (1968). Learning for mastery. Evaluat-ion Coimnent . 
1» 2. 

Bloom, B.S. The new direction in educational research: altera- 
ble variables. £M I2fil^ EasSSJif 61, 382-385. 

Bolocfsky, D.N., & Mescher, S. (1984). Student character' jticst 
UTing student characteristics to develop effective grading 
practices. The Directive XfiachfiTr t, 11-23. 

Borko, H., & Shavelson, R.J. (1978). Teachers' sensitivity to 

the relia'^ility of information in making causal attributions 
i 1 on achievement situation. Journal Educational Pav- 
GhQlsmf 10., 271-279. 

Bresee, c.W. (19'7*^) . On "Grading on the Curve." The Clearing 
flfiUfifif 5., T 08-110. 

Brookover, w.B. , Schneider, J.M. (1975). Academic environaents 
and elementary school achievement. Journal of Reaearch and 
Davftlopment la Education . S., 82-91. 

Brophy, J., & Evertson, c. (1981). Student Characteristics and 
Teaching. New York: Longman. 

Brown, D.J. (1971). Appraisal Procedures in the Secondary 
Schools. Englewood Cliffs, NJ: Prentice-Hall. 

Brown, A., & Craig, r.p. (1977). Grading testing and grading. 
Blfimentary school journal ^ Jl, 395-399. 

Burton, N.W. (1978). Societal standards. Journal of B<1ueatlonal 
Meaaurement . 15, 263-271. 

Buysr N., & Winfield, a.H. (1982). Learned helplessness in high 
school students following experience of noncontlngent 
rewards. jQUrnal Ol Reaearch in Peraonallty , fi,, 6-9. 

O -74- 

EKIC CO 



Chansky, N.N. (1975). A critical examination of school report 
cards from K through 12. Reading improvement . 12., 184-192. 

Crano, W.D., & Mellon, P.M. (1978). Causal influence of teach- 
ers' expectations on children's academic performance: A 
cross-lagged panel analysis. Journal of Educational Pey- 
choloqy. JJLf 39-49. 

CrookSr A.D. (1933). Marks and marking systems: A digest. Jour- 
JlAl Ql Educational Research . 21, 27:259-272. 

Cross, L.J., & Cross, G.M. (1980). Teachers' evaluative comments 
and pupil perception of control. Journal q± Experjmenfral 
Education . 68-71. 

Culler, J. Literary competence. Pp. 101-11"' in J. p. Thompkins 
(8d.) , Reader-response criticism : £ju2jn formalism post 
Btructuralism. Baltimore, MD: Johns Hopkins University 
Press. 101-117. 

Davis, R.B., & McKnight, C. (1976). Conceptual, heuristic, and 
S-algorithmic approaches in mathematics teaching. Journal 
fi£ Children's Mathematical Behavior. 1( Su pplement 1) , 
271-286. 

Deutsch, M. (1979). Education and distributive justice: Some 
reflections on grading systems. American Paychologigt . 
391-401. 



Dornbusch, S.M., & Scoct, W.R. (1975). Evaluation and the exer - 
ciae of authority . San Francisco: Jossey-Bass. 

Doyle, W. (1983) . Academic work. BsmlSK flf Educational 
Research. 159-199. 

DoylCr W. (1979) . The tasks of teaching and learning in class- 
rooms. Research and Development Report Number 4103. Aus- 
tin, Texas: Research and Development Center for Teacher 
Education, University of Texas, 1979. 

Ediger, M. (1975). Reporting pupil progress: Al*-ernatives to 
grading. Educat . 3nal Leadership . 22, 265-267. 

Egan, 0., & Archer, P. (1935). The accuracy of teachers' ratings 
of ability: A regression model. American Educational 
Research Journal . 22, 25-34. 

Elmore, P., & Beggs* D.L. (1972). Stability of teacher ratings 
of pupil behavior in a classroom setting. Paper presented 
at the meetings of the American Personnel and Guidance Asso- 
ciation, March. 



RJC 



-75- 



Evans, E.D.r & Engelbergr R.A. (1985). A developmental study of 
student perceptions of school grading. Paper presented at 
the Biennial Meeting of the Society for Research on Child 
Development . Toronto . 

Peldhusenr J.F (1964) . Student perceptions of frequent quizzes 
and post-mortem discussions of tests. Journal of Educa - 
tional Measurement, 1, 51-54. 

Pennesseyr J. (1973). The "Focused Flexibility and GREG" project 
at Walbrook High School. Summary Report. Baltimore, Mary- 
land: Center for the Social Organization of Schools. 

Feshbach, N.D. (1969). Student teacher preferences for elemen- 
tary school pupils varying in personality characteristics. 
Journal Ot Educational Psycholog y. 126-132. 

Frase, L.T. (1972). Maintenance and control in the acquisition 
of knowledge from written materials. In J.B. Carroll, and 
R.O. Freedle (eds.) , Language Comprehension and tJifi Acquis i -- 
tlfin Qt Knowledge , Washington, DC; Winston. 

Frase, L.T. (1975). Prose processing. Pp. 1-48 in G.H. Bower 
(ed.) , Th& Psychology Ol Learning and Motivation . ^^Qlumfi 
New York: Academic Press. 1-48. 

Frederiksen, C.H., & Dominic, J. (1981). Writing : The Nature . 
DevelQPmenty and Teaching Qt Written Communication (volume 
2). Hillsdale, NJ: Erlbaum. 

Fuchs, L.S., Deno, ,L., & Mirkin, P.K. (1984). The effects of 
frequent curriculum-based measurement and evaluation on 
pedagogy, student achievement, and student awareness of 
learning. American Educational Research Journal . 2ir 
449-460. 

Gaston, N. (1976) . Evaluation in the affective domain. Journal 
Ql Business Education . 134-136. 

Geisinger, K.F., & Rabinowitz, W. (1980). Individual differences 
among college faculty in grading. Journal q± instructional 
Paychology . Zf 20-27. 

Giannangelo, D.M. (1975). Make report cards meaningful. The 
Educational Hfiiumr 11, 409-415. 

Giannangelo, D.M., & Lee, K.Y. (1974). At last: Meaningful 
report cards. Phi Q^ltA Kappan . 55 . 630-631. 

Gibbf G.D. (1983). Influence of "halo" and •'demon" effects in 

subjective grading. Perceptual and Motor Skills . 67-70. 



-76- 



Glaser, R. (1963) . Instructional techno^v ./ and the measurement 
of learning outcomes. M£JLI£^ PsycholQaist ^ IQ_, 519-521. 

Glass, G.V. (1978). Standards and criteria. Journal of Educa - 
tism^ Measurement. 1^, 237-261. 

Gronlund, N.E. (1971). Measurement and Evaluation in Teaching, 
(second edition). New York: Macmillan. 

Gullickson, A.R. (1982). The practice of testing in elementary 
and secondary schools. Unpublished Report. ED229391. 

Gullickson, A.R. (1984) . Teacher perspectives ot their instruc- 
tional use of tests. Jjmuml ot Educational Research , U, 
244-248. 

Hackmanr J.R. (1969). Touurd understanding the role of tasks in 
behavioral research. AcJia Psycholooica . 97-128. 

Hambleton, R.K. , Swaminathan, H. , Algina, J., & Coulson, D.B. 
(1978). Criterion-referenced testing and measurement: A 
review of technical issues and developments. Review of Edu - 
cational Research . 1-47. 

Hansen, J.M. (1977). Personalized achievenent reporting: Grades 
that are significant. Xhs. Uigh Sshool Joiicnal, fiH, 255-263. 

Heller, L. (1978). Assessing the process and the product: An 
alternative to grading. £nalifih jlfiumal, £, 66-69. 

Hempel, C.G. (1952). Fundamentals q± Concept For mation in Emp^r- 
Ifial Science. Chicago: University of Chicsgo Press. 

Herman, J., & Dorr-Bremme, D.W. (1984). Testing and assessment 
in American public schools: Current practices and direc- 
tior.s for improvement. Los Angeles: Center foi the Study 
of Evaluation, University of California at Los Angeles. 

Holmes, M. (1978). Evaluating students in the affective domain. 
££hoi2l Guidance mnJ^sL, Zl, 50-58. 

Holtz, R.E. (1976). More than a letter grat'o. Social Education . 
11, 23-24. 

Jacks(n, G.B. (1975). The research evidence on the effects of 
grade retention. Review fif Educational Research . 45., 



Jarrett, CD. (196:). Markino and reporcing practices in the 
American secondary school. Peabody Journal of Education . 
41, ^6-48. 



Jenkins, j.j. (1977) . Remember that old theory of memory? well, 
forget itl Pp. 413-430 in R. Shaw and J. Bransford (eds.) , 
Peiceivingf aiiilnar and knowing! toward an ecological psy- 
chology. Hillsdale, N.J.: ErJbaum, 1977. 413-430. 

Johnson, J.R. (1984). Synthesis of research on grade retention 
and social promotion. Educdtional Leadership . 41, 66-68. 

Krathwohl, D.R., & Payne, D.A. (1971). Defining and assessing 
educational objectives. p. 17-45 in Robert L. Thorndike 
(ed.) , Educational Measurenent (aecomi edition ) . washing- 
ton, DC: American Council on Education. 

Ladd, E. (1961) . A comparison of two types of training with 
reference to developing skill in diagnostic oral reading 
testing. Unpublished doctoral dissertation, Florida State 
University, 1961. 

Lakatos, I. (1971). Pp. 91-196 in I. Lakatos and A. Musgrave 
(eds.) , Criticism and th£ growth Ol knowledge . New York: 
Cambridge U. Press. 

Lawler, E.E. (1976). Control systems in organizations. Pp. 

1247-1291 in M.D. Dunnette (ed.). Handbook sit Industrial and 
Organizational Psvchology . Chljago: Rand NcNally. 

Leinhardt, G., & Seewald, A.M. (1981). Overlap: What's tested, 
what's taught? Journal q£ Educational Measur pment . Ifl., 
85-96 . 

Levin, I. P., Ims, J.R., & vilmain, J. A. (1980). Information var- 
iability and reliability effects in evaluating student per- 
formance. Journal of Educational Psychology . 22., 355-361. 

Levine, M. (1976). The academic achievement test: its historical 
context and social functions. Amfcrican Psychologist . 
228-238. 

Levine, A., & Levine, M. leds.). (1970). The Gary Schools . Epi- 
logue by Abraham Flexner and Frank P. Bachman. Cambridge, 
MA: MIT Press. 

Lien, A.J. (1967) . Measurement and evaluation q£ Ig. linas A 
handbook f^x teachprs . Dubuque, lA: Wm. C. B' «n. 

Lindquist, E.F. (1969). The impact of machines on educational 
measurement. Pp. 351-369 in R.W. Tyler (ed.) , Educational 

evaluation: N£u zsils&, nssi meana. The 6 8th Yearbook of the 
National Society for the Study of Education, Part II. Chi- 
cago: University of Chicago Press. 351-369. 

Lindvall, CM. (1961) . leai^na and evaluation: £ji introduction . 
NY: Harcourt, Brace, and World. 



ERIC 



-78- 



Linn» R.L. (1983) . Testing and instruction; Links and 

distinctions. Journal Qjt Educational Measurement . 2Q « 
179-189. 

Lintnerr k.C, & Ducette, J. (1974). The effects of locus of 

controlr academic failure and task dimensions on a student's 
responsiveness to praise. American Educational Research 
JfiUinAlr llr 231-239. 

Lissman, U.r & Paetzoid, B. (1983). Achievement feedback and its 
effects on pupils — a quasi-experimental and longitudinal 
study of two kinds of differential feedback, noim-ref erenced 
and criterion-referenced feedback, studies In F^ ucation&l 
Eva luation . ^, 209-222. 

Looker E.A. (1968). Toward a theory of task motivation and 

incentives. Or9anizationa1 Behavior ^nd iiuman Performance . 
1, 157-189. 

Mager, R.P. (1962). Preparing instructional objective s. Palo 
AltOf CA: Feardon. 

Mayo, S.T. (December 1967) . Pre-servic? preparation ot teacherfl 
in educational measurement. Chicago: Loyola University. 

McDill, E.L., Natriello, G., & Pallas, A. vl985) . Raising stan- 
dards and retaining students, Reviev; of Educational 
Research. S2t 415-434. 

Meskauskas, J«A. (1976) . Evaluation models for criterion-refer- 
enced testing: views regarding mastery and standard set- 
ting. Review of. Educational Research . i£, 133-158. 

Meyer, N. , Bachmann, M., Biermann, U., Hempelmann, N., Ploget, 
F., & Spiller, H. (1979). The informational va^ue of evalu- 
ative behavior: Influences of praise and blame ot percep- 
tions of ability. .Toumal Educational Psychology . 71 . 
259-268. 

Michaels, J.fv. (1977). Classroom reward structures and academic 
performance. ESIHlSSl Ql Educational Research . H, 87-98. 

Morine-Dersbimer, G. (1979). How teachers see their pupils. 
Educational Research Quarterl y. 2r 43-53. 

National Commission on Excellence in Education. (1983) . A nation 
at risk: The imperative for educational reform. Washing- 
ton, DCt U.S. Government Printing Office. 

National Institute of Education. (1979). Testing, teaching, and 
learning. Report of a Conference on Research on Testing. 
Washington, DC: National Institute of Education. 



O -79- 

ERLC 



NatriellOr G. (1985). Merit pay for teachers: The Implications 
of theory fcr practice, in H.C. Juhnsonr Cr.r (ed.) r Merit , 
OOnfiX And teachers* careers , Sanham, MD: University Press 
of America. 

Natriello, 6. (19b2) . Orcjunizeitional evaluation systems and stu- 
dent disengagement m secondary schools. St. Louis r MOx 
Washington Univer-ltyr Final Report to the National Insti- 
tute of Educatio"^ 

MatriellOr G. r 6 Dornbuschr S.M. (1984). Teacher evaluative 
fltandardfl And student QltOIi.. * - : Longman. 

Natriello, G. , 6 McDill, E.L. (1986). Performance standard , 
student effort on homework and academic achievement. 
Sociology flf Education ^ 59 > 18-31. 

Natriello, G. , 6 McPartland, J. (1?87). Adjustments in High 

School Teacherr • Grading Criteria; Accomodatior or Motiva- 
tion? Paper presented at the Annual Meeting of the American 
Ed cational Research Association. Washington, D.C.i April. 

Nearine, R.J. (1970). The test, the time, and the teacher. Mea - 
BUtCment and Evaluation In Guidance, 2f 214-216. 

Nedelsky, (1954). Absolute grading standards for objective 

tests. Educatlon^^l ^D^ Pflychological Measurement , JJL, 3-19. 

Olejnik, S.E. (April 1979). Standardized achievement programs 
viewed from the perspective of non-measurement specialist. 
Paoer presented at the annual meeting ot the National Coun- 
cil on Measurement in Education, San Francisco. 

Openshav, K. (1967) . A failure of the Minnesota Teacher Attitude 
Inventory to relate to teacher behavior. Journal of Teacher 
Education, 233-239. 

O'Regan. M. , Airasian, P., 6 M^daus, G. (April 1979). The use of 
standardized test information by Irish teachers. Paper pre- 
sented at the annual meeting of the r::tional Council on Mea- 
surement and Education, San Francisco. 

Oren, D.L. (1983) . Evaluation svstems and atttibutional tenden- 
cies in the classroom: A sociological approach. Journal qf ^ 
Educational Research, 2£f 307-312. 

Page, E.B. (1958). Teacher comments and student performances A 
seventy-four classroom experiment in school motivation. 
jQUrnfl] Ul Educational Psychology, 173-lBl. 



Page, W, t Carlson, K. (1975). The process of observing oral 
reading scores. Readtn^i Horigona. 1^, 147-150. 



Peckhanf P.D., 6 Roe, M.D. (1977). The effects of .:..w.uent 

testing. Journal Ql Research aq^ Developriient In Education . 
Mr 40-50. 

Pedullar J.J., Airasian, p.w., & Madaus, G.F. (1980). Do teacher 
ratings and standardized test results of students yield the 
same information? American Educational Research Journal . 
11, 303-307. 

Pedulla, J.J., Airasian, P., Madaus, G., & Kellaghan, T. (1977). 
Proportion and direction of teacher rating changes of 
pupils' progress attributable to standardized test informa- 
tion. Journal si Educational Psychology . 702-709. 

Pestalozzi, H. Ober den Aufenthalt in Stans (1807) . In L.W. 

Seyffarth (ed.) Peatalo^zi'a aamtllche HgjJ^, Bd. 8. Lieg- 
nitzt Seyffarth, 1900. 

Poole, R.L. (1979). Evaluating nd victimizing elementary school 
children. Education. SI, 115-120. 

Poole, R.L. (1976). A teacher-pupil dilemma: Student evaluation 
and victimization. Adoles c ence . H, 341-347. 

Popham, N.J. (1973) . Establishing performance standards. Engle- 
wood Cliffs, NJt Prentice-Hall. 

Popham, W.J., 6 Husek, T.R. (1969). Implications of criterion- 
referenced measurement. Journal qI Educational Mea«ur«»inent . 
fir 1-9. 

Purkey, S.C., 6 r^nith, M.S. 1983). Effective schools: A review. 
Blftmant^ary &_nool Journal ^ Sit 427-452. 

Reunera, h.B.^ Gage, N.L., & Rummel, J. p. (1960). A practical 

introduction to measurement and evaluation. New Yorkt Har- 
per and Brothers. 

Rheinberg, F. (1983). Achievement evaluation: A fundamental dif- 
ference and its motivational consequences, stud lea In edu- 
cational Evaluation. S, '85-194. 

Riralngton, 6.T. (1977). Evaluation in history and the social 
sciences: the longitudinal aspect and its problems. nla- 
tfiix And fififilAl &LifiIlfi£ ISSASh&L, 12, 207-211. 

Roeder, B.R. (1973). Teacher education curriculum — your final 
grade is F. Journal of Educational Measurement . 10 . 
141-143. 

Rosenholts, S.J., 6 Rosenholtz, S.H. (1981). Classroom organiza- 
tion and the perception of ability, sociology ig Education . 
51, 132-140. 

FRir 



T<os«nholtx, S.J., & Wilson, B. (1980). The effect of classrooa 
structure on shared perceptions of ability. Amerlean lduca» 
tionfll Research Journal. H, 75-82. 

Rosensfiine, B., & Furst, N. (1973). The use of direct observa- 
tion to study teaching. Pp. 122-183 in P.h.W. Travers 
(ed.) f SfiCfiDd handbook stL research an testing . Chicaoos 
Rand MdNally. 122-183. 

Rosswork, S.6. (1977). Goal setting: The effects on an acadeaic 
task with varying magnitudes of incentive. Journal at Edu- 
cational PsychoTogy. glf 7X0-715. 

Rothf S., & Bootzin, R.R. (1974). The effects of experinentally 
induced expectancies of external control: An investigation 
of learned helplessness. Journal fl£ Personality and Social 
Pgycholcoy. 2a, 253-264. 

Rudnan, B.C., Kelly, J.L., Wanous, D.S., Mehrens, W.A. , Clark, 
CM., & Porter, A.C. (1980). Integrating assessment with 
instruction: A review (1922-1980). East Lansint,, Nit 
Institute for Research on Teaching, Michigan State Oniver- 
sity. College of Education. 

Rudnan, M.K. (1978). Evaluating students: How to do it right. 
LfiAiniUflf 2, 50-53. 

Ryan, K.M., & Levine, j.N. (1981). Impact of academic perfor- 
mance pattern on assigned grade and predicted performance. 
JflttXnAl fl£ Educational Psychology . 21, 386-392. 

Salganik, L.H. (1982). The effects of effort marks on report 
card grades. Paper presented at the Annual Meeting of the 
American Educational Re8eara^ Association, Tjob Angeles. 

Sartore, R.L. (1975). Grading: A bearching lool:. Rt3ueatlonal 
Leadership r 261-273. 

Schunk, D.H. .1983) . Reward contingencies and the development of 
children's skills and self-ef fiC9<^y. Journal q± fdllcatiXtDAi 
Pgyehology . 22., 511-514. 

Scott, w.R. (1981). Organizations: Rational, natural and open 
systems. Englewood Cliffs, NJ: Prentice HaJ.l. 

Scriven, N. (1978). How to anchor standards. Journal Educa- 
tional Measurement f l^, 273-275. 

Stligman, M.B.P. (1975). Helplessness: On depression, develop- 
ment, And death. San Francisco: Freeman, 1975. 

Shephard, L.A. (1976). Setting standards and living with thsm. 
Paper presented at the annual meeting of the National Ccun- 
cil on Neasurement in Education, San Francisco. 



Slapeon, C. (1981). Clas. om structure and the organization of 
ability. ^jciQlQfly oi -.^ucatton. 129-132. 

Slavin, R.E. (1977) . Classroom reward structure: An analytical 
and practical review. Review qI Educational Reaearch . i2» 
633-650. 

Slavin, R.E. (1978). Separating incentives, .^edback, and evalu- 
ation! Toward a more effective classroom system. Educa- 
tional PaycholoalBt. 97-100. 

Snithf L.R. (1984) . Effect of teacher vagueness and use of lec- 
ture notes on student performance. Journal of Educational 
fif^&ajlfillf lEf 68-74. 

Soloo L. (1977) What we do because testing duesn't work. 
MfltiOnfll Elementary Principal . ^3-66. 

Sorotskin, P., Fleming, E., & Anttonen, R. (1974). Teacher know- 
ledge standardized test information and its effects on pupil 
I.Q. and achievement. Journal pt Experimental Education , 
ia, 79-85. 

Starch, D., & Elliott, B.C. (1912). Reliability of the grading 
of high school work in English. Hifi School Review . 2SLr 
442-457 . 

Stewart, H.J. (1975). A multi-dimensional evaluating-reporting 
system in the elementary school. Beadina improvement . 12, 
174**176 • 

Stewart, L.G. , & White, M.A. (1976). Teacher comments, letter 
grades and student performance: what do we really know? 
JfiUinal Qt Educational Paycholooy . 488-500. 

Stockhard, J., Lang, D., t Wood, W. (1985). Academe merit, sta- 
tus variables, and students' grades. Journal of Reaeareh 
ADd DCYelQpment in Education, la, 12-20. 



Synon^S, P.M. (192i). Notes on rating. Journal of A pplied Pay- 
fihfilfiflXf If 188-195. 

Tecwilliger, J.G. 1977) . Assigning grades — philosophical 

issues and prtk^tical recommendations. Journal of Research 
ADd DevfilOpBgnt in Education. 2A, 21-39. 

Thompson, J.D. (1967). Organizations in action. New York 1 
HcGraw-Hill. 

Thorndike, R.L. (196?) . Narks and marking i-jntemn. Pp. 759-766 
in R.L. Bbel (ed.) , Encyclopedia of educ. -lonal reaearch . 
Ilew York? Naciuillan. 



ERIC 



-83- 



Thornton, J.W., & Jacobs, P.D. (1972). The facilitating effects 
of prior inescapable u .avoidable stress on intellectual per- 
formance. Ps ychometric Science . 2&.f 265-271. 

Tolor, A.r Scarpettl, W.L., & Lane, P. A. (1967). Teachers' atti- 
tudes toward children's behavior revisited. Journal of Edu - 
cational Psychology . 175-180. 

Tyler, R.W. (1973) . Tesvlng for accountability. In A.C. Orn- 

steln (ed.) , Accountability ioL teachers and school adminis - 
trators . Belmont, CA: leardon. 

Varenne, H., & Kelly, M. (1976). Friendship and fairness: Ideo- 
logical tensions in an American high school. Teachers Col - 
Ifise Becflxd, 22, 601-614. 

Waller, W. (1932). The sociology of teaching. New York* Wiley. 

Walling, D.R. (1975). Designing a "report card" that communi- 
cates. Educational Leadership . 22, 258-260. 

Ward, J.G. (1981). Testing and teaching: Partners In learning, 
ffidbfidy Journal Education . 91-95. 

Warries, E. (1982). Relative measurement and the selective phi- 
losophy In education. Evaluation in Education . 5., 191-202. 

Welner, B. (1973). A theory of motivation fc some classroom 

experiences. Journal si Educational P sychology . 21, 3-25. 

Webster, M., & Entwisle, D.R. (1979). Expectation effects on 
performance expectations. Social Forces . 55., 493-503. 

Williams, R.G., Pollack, M.J., & Ferguson, N.A. (1975). Differ- 
ential effects of two grading systems on student perfor- 
mance. J(QUUL4l Ql Educational Psychology . 67. 253-258. 

Wilson, R.J. (1977). Three faces of evaluation: Students, teach- 
ers, curriculum, flifitfiiy and Soslal Solsnss. Xsashsi-f 12, 
203-206. 

Wilson, S. (1976). You can talk to teachers: Student-teacher 

relations in an -alternative high schoo:.. Teachers coll eye 
Bfiiaxii, IS., 77-100. 

Wise, R.I., & Newman, B. (1975). The responsibilities of grad- 
ing. Educational Leadership . 22, 253-256. 

Wortman, C.B. , & Brehm, J.w. (1975). Responses to uncontrollable 
outcoin«%s: An integration of reactance theory and the 
learned helplessness model. In L. Berkowitz (ed.), Advanrea 
In EXPecimental social psychology, ( volume Q) . New Yorks 
Academic Pre. 3. 



ERIC 



-84- 



Yeh, J. P. (1978). Test use in schools. Washingtonr DC: O.S. 
Department of Health, Education and Welfare and National 
Institute of Education. 

Zahorikr J. A. (1968). classroom ^feedback behavior of teachers, 
Journal fif Educational Research ^ §2.f 147-150. 



-85- 



ERiC ^i- 



Figure 1 

A Model of Evaluation Processes in Schools and ClassroomB 



Monitoring Outcomes of 
The Evaluation of Students' 



(8) 



Providing Feedback 
to Student Perforners 



(7) 



Apprai«in« 

Student Perfc nuance 



(6) 



Sampling Information 
On Student Performance 
(5) 



Establishing the Purposes 
for Evaluating Students 
(1) 



Assigning Tasks 
to Students 

(2) 



Setting Criteria for 
Student Performance 



Setting Standards for 
Student Performance 
(4) 



O) 



ERIC 



-86- 



2 



Table 1 



liist of the Purposes of EX'aluation in Schools and Classrocrns 



1) to assess educational equity (Airasian and Madaus, 1983) 

2) to produce avidence on school and program effectiveness, currlcular 
methods, procedures, etc. (Airasian & Madaus, 1983; Ward, 1981; 
Ahnann and Clock, 1967; Lien, 1967) 

3) to guide funds allocation (Airasian and Madaus, 1983) 

4) to evaluate teachers (Airasian & Madaus, 1983) 

5) to classify students to assign than to particular programs, and 

provide instructional guidance (Airasian and Madaus, 1983; Fennessey, 1973; 
Lien, 1967; Rentiers, Gage, and Runnel, 1960) 

6) to assess oorpetencies to certify successful high school on"T)letion and 
grade pronotion (Hambleton, Swaminathan, Algina, & Cbulson, 1978; 
Jackson, 1975: Johnson, 1984; Levine, 1976) 

7) to structure better- teaching proced^ires and irtprove instruction 
(Puchs, Deno, andMirkin, 1984; Ward, 1981; Linn, 1983) 

8) to provide better feedback to students and allow them to discover 
their own abilities, their strengths and weaknesses (Fuchs, Deno, & 
Mirkin, 1984; Wilson, 1977, Linn, 1983) 

9) to racmtor individual progress (Hantoleton, Swaminathan, Algina, & 
OoixLson, 1978) 

10) to diagnose learning deficiencies (Hambleton, Swaminathan, Algina & 
Ooulson, 1978; Ahnann & Clock, 1967) 

11) to select students for certain educational and occupational opportunities 
(Levine, 1976; Uen, 1967) 

12) to motivate students (Wise and Newman, 1975: Lien, 1967) 

13) to report to parents (Wise and Nevgman, 1975) 

14) to provide feedback to teachers on v*)at students have and have not learned; 
to guide future teaching (Linn, 1983; Lien, 1967) 

15) to flag or identify those items in a curriculum that are particularly 
iitportant (Linn, 1983) 

16) to establish staruJaiJs and meiintain standards (Sartore, 1975; Lien^ 1967) 



ERLC 



-87- 



17) to select students for limited positicxis in programs, institutions, 
aiitl occupations (Ward, 1981; Warries, 1982; Levine, 1976) 

18) to predict future academe siaccess (Wilson, 1977; Warries, 1982) 

19) to assess the acadGmic achievenent of individual pipils (Ahmann & 
Clock, 1967) 

20) to assess the educati.onal progress of large populations to guide 
educational policy (Anmann and Clock, 1967) 

21) to furnish instruction to students (Lien, 1967) 

22) to adapt instructic»i to the different needs of individual students 
(REsmers, Gagc^, and Rurnnel, 1960) 

23) to provide personal (educational, vocational, social, etnoticnal 
guidance to studoits (Remners, Gage, and Rumnel, 1960) 

24) to inprove public relations through reports to parents and staff 
(Hamers, Cage, & Rurnnel, 1960) 

25) to enforce the authority and control of the school over stxxJents 
(Natriello, 1982) 



ERIC 



-88- 



Table 2 



Percentages of Principals Reporting The Use of Test Results and Other 
Information on Student Performance as Cr'icial or Important for Specific 
Purposes in the School By School Le^^el (Elementary/ Secondary) 
(As Reported by Herman and Dorr-Bremme, 1984) 



Purpose 



Tests and Other Information Sources 



Norm- Minimum District Teachers' Teachers' Other 

Referenced Competency Objectives- Tests and Opinions/ Sources 
Tests Tests Based Tests Assignments Judgements 



Curriculum 
F lannicg 

Assigning 
Students to 
Classes 

Teacher 
Eva luat ion 

Allocating 
Funas 

Student 
Promot ion 

Intoroing 
tne Public 

Coonunicating 
to Parents 

Reporting to 
District 



78/74 

47/72 
16/20 
28/24 
51/24 
72/74 
78/79 
81/86 



60/75 

30/64 
11/15 
21/28 
36/48 
38/63 
56/69 
55/72 



65/57 

38/45 
25/21 
29/21 
48/26 
41/43 
63/45 
58/56 



72/63 

74/7 5 
40/43 

84/84 
42/47 
98/96 
53/60 



88/84 



84/ 80 



81/94 



96/76 



49«/76^ 



100^/ 95** 



77C84<= 



94d/96^ 



95/94 92«/97^ 



-/■ 



— not asked 
a ■ students' past classroom behavior 
b ■ observations of teachers' teaching 
c ■ specific directions from district 
d ■ classwork throughout the year 
e ■ observations of the student 
€ ■ student's report card grades 



ERIC 



-89- 



:)5 



Table 3 



Number and Content of Rating Categories in Reports Used by 312 School 

Districts Surveyed by Chansky (1975) 
(Reported in Percent by Grade Level for Acadendc and Dispositional Categories) 



Grade Levels 





K 


1-3 


4-6 


7-9 


10-12 


Categories 


Acad/Disp 


Acad/Disp 


Acad/Disp 


Acad/Disp 


Acad/Dic 


A) of Steps 












1 


13/11 


._/.. 


— /A3 


~/20 


—Ilk 


2 


10/— 


14/35 


~/12 


12/20 


10/14 


3 


44/19 


33/27 


19/26 


11/40 


—IV 


4 


-/- 


15/15 


10/9 




-1- 


5 


_-/_- 


31/20 


60/9 


69/13 


1912k 


B) Content 












Adequacy 


32/16 


42/67 


56/30 


20/50 


11/37 


Position 






16/60 


16/15 


19/18 


Prestige 




__/— 


12/~ 


28/23 


23/23 


Passage 








26/-- 


31/10 


Presence 


16/— 










Endorsement 










16/12 



Adequacy (e.g., satisfactory) 

Position (e.g.. average* above average* below average) 
Prestige (e.g*. excellent, outstanding) 
Passage (e.g., pass-fail) 

Presence (e.g., all of the time, frequently, not yet) 
Endorsement (e.g., superior, good, poor, inferior) 



ERIC 



-90- 



• « 



Prog^aas & 

Policies 



Table 4 

The Implications of Selected Programs and Policies on Aspects of the Evaluation Process in Schools and Classrooms 

Elements of the Evaluation Process 
Criteria Standards Samples 



Purposes Tasks 



Appraisa i Feedback 



Minimum 

Competency 

Testing 



Certification Simple Time Bcvmd Absolute 



Infrequent Removed Simple 

from Teacher Pass/Fail 



Hascery 
Learning 



Direction Small Time Bound Absolute Frequent 



A's for 
Hastery 



Ditferentiat 
Frequent 



S P.L. 94-142 Direction 



Individual- Not specified Individually 
ized Referenced 



Frequent 



Teacher 
Dependent 



Frequent 



ERLC 



